Convolutional neural network
One popular and effective model for image classification is the convolutional neural network (CNN).
The Residual Network (ResNet), which was first used by Microsoft researchers in 2015, is a very powerful type of CNN. The main innovation of ResNet is the use of so-called “residual connections,” which make it possible to train much deeper networks than was previously thought possible. ResNet achieved state-of-the-art results on the ImageNet dataset, a benchmark for image classification, at the time of its introduction and is still considered a powerful model.
Another recent development in CNNs is the use of attention mechanisms. Attention allows the model to focus on certain parts of the input rather than processing the entire input equally. One example is the Attention U-Net, which combines the U-Net architecture with attention mechanisms to improve image segmentation.
Another exciting area of research is using Transformer architectures for image classification, which has been popularized by Vision Transformer (ViT). ViT architectures have been shown to achieve high accuracy on image classification tasks and outperform state-of-the-art convolutional architectures.