A convnet for the 2020s

Z Liu, H Mao, CY Wu, C Feichtenhofer… - Proceedings of the …, 2022 - openaccess.thecvf.com
The" Roaring 20s" of visual recognition began with the introduction of Vision Transformers
(ViTs), which quickly superseded ConvNets as the state-of-the-art image classification …

Conv2former: A simple transformer-style convnet for visual recognition

Q Hou, CZ Lu, MM Cheng… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Vision Transformers have been the most popular network architecture in visual recognition
recently due to the strong ability of encode global information. However, its high …

Convnext v2: Co-designing and scaling convnets with masked autoencoders

S Woo, S Debnath, R Hu, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Driven by improved architectures and better representation learning frameworks, the field of
visual recognition has enjoyed rapid modernization and performance boost in the early …

Visformer: The vision-friendly transformer

Z Chen, L Xie, J Niu, X Liu, L Wei… - Proceedings of the …, 2021 - openaccess.thecvf.com
The past year has witnessed the rapid development of applying the Transformer module to
vision problems. While some researchers have demonstrated that Transformer-based …

Lightvit: Towards light-weight convolution-free vision transformers

T Huang, L Huang, S You, F Wang, C Qian… - arXiv preprint arXiv …, 2022 - arxiv.org
Vision transformers (ViTs) are usually considered to be less light-weight than convolutional
neural networks (CNNs) due to the lack of inductive bias. Recent works thus resort to …

Convnets vs. transformers: Whose visual representations are more transferable?

HY Zhou, C Lu, S Yang, Y Yu - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
Vision transformers have attracted much attention from computer vision researchers as they
are not restricted to the spatial inductive bias of ConvNets. However, although Transformer …

Understanding robustness of transformers for image classification

S Bhojanapalli, A Chakrabarti… - Proceedings of the …, 2021 - openaccess.thecvf.com
Abstract Deep Convolutional Neural Networks (CNNs) have long been the architecture of
choice for computer vision tasks. Recently, Transformer-based architectures like Vision …

Volo: Vision outlooker for visual recognition

L Yuan, Q Hou, Z Jiang, J Feng… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Recently, Vision Transformers (ViTs) have been broadly explored in visual recognition. With
low efficiency in encoding fine-level features, the performance of ViTs is still inferior to the …

Cmt: Convolutional neural networks meet vision transformers

J Guo, K Han, H Wu, Y Tang, X Chen… - Proceedings of the …, 2022 - openaccess.thecvf.com
Vision transformers have been successfully applied to image recognition tasks due to their
ability to capture long-range dependencies within an image. However, there are still gaps in …

Msg-transformer: Exchanging local spatial information by manipulating messenger tokens

J Fang, L Xie, X Wang, X Zhang… - Proceedings of the …, 2022 - openaccess.thecvf.com
Transformers have offered a new methodology of designing neural networks for visual
recognition. Compared to convolutional networks, Transformers enjoy the ability of referring …