A convnet for the 2020s
The" Roaring 20s" of visual recognition began with the introduction of Vision Transformers
(ViTs), which quickly superseded ConvNets as the state-of-the-art image classification …
(ViTs), which quickly superseded ConvNets as the state-of-the-art image classification …
Conv2former: A simple transformer-style convnet for visual recognition
Vision Transformers have been the most popular network architecture in visual recognition
recently due to the strong ability of encode global information. However, its high …
recently due to the strong ability of encode global information. However, its high …
Convnext v2: Co-designing and scaling convnets with masked autoencoders
Driven by improved architectures and better representation learning frameworks, the field of
visual recognition has enjoyed rapid modernization and performance boost in the early …
visual recognition has enjoyed rapid modernization and performance boost in the early …
Visformer: The vision-friendly transformer
The past year has witnessed the rapid development of applying the Transformer module to
vision problems. While some researchers have demonstrated that Transformer-based …
vision problems. While some researchers have demonstrated that Transformer-based …
Lightvit: Towards light-weight convolution-free vision transformers
Vision transformers (ViTs) are usually considered to be less light-weight than convolutional
neural networks (CNNs) due to the lack of inductive bias. Recent works thus resort to …
neural networks (CNNs) due to the lack of inductive bias. Recent works thus resort to …
Convnets vs. transformers: Whose visual representations are more transferable?
Vision transformers have attracted much attention from computer vision researchers as they
are not restricted to the spatial inductive bias of ConvNets. However, although Transformer …
are not restricted to the spatial inductive bias of ConvNets. However, although Transformer …
Understanding robustness of transformers for image classification
S Bhojanapalli, A Chakrabarti… - Proceedings of the …, 2021 - openaccess.thecvf.com
Abstract Deep Convolutional Neural Networks (CNNs) have long been the architecture of
choice for computer vision tasks. Recently, Transformer-based architectures like Vision …
choice for computer vision tasks. Recently, Transformer-based architectures like Vision …
Volo: Vision outlooker for visual recognition
Recently, Vision Transformers (ViTs) have been broadly explored in visual recognition. With
low efficiency in encoding fine-level features, the performance of ViTs is still inferior to the …
low efficiency in encoding fine-level features, the performance of ViTs is still inferior to the …
Cmt: Convolutional neural networks meet vision transformers
Vision transformers have been successfully applied to image recognition tasks due to their
ability to capture long-range dependencies within an image. However, there are still gaps in …
ability to capture long-range dependencies within an image. However, there are still gaps in …
Msg-transformer: Exchanging local spatial information by manipulating messenger tokens
Transformers have offered a new methodology of designing neural networks for visual
recognition. Compared to convolutional networks, Transformers enjoy the ability of referring …
recognition. Compared to convolutional networks, Transformers enjoy the ability of referring …