Scattering vision transformer: Spectral mixing matters

B Patro, V Agneeswaran - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Vision transformers have gained significant attention and achieved state-of-the-art
performance in various computer vision tasks, including image classification, instance …

You Only Need Less Attention at Each Stage in Vision Transformers

S Zhang, H Liu, S Lin, K He - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com
Abstract The advent of Vision Transformers (ViTs) marks a substantial paradigm shift in the
realm of computer vision. ViTs capture the global information of images through self …

Lightvit: Towards light-weight convolution-free vision transformers

T Huang, L Huang, S You, F Wang, C Qian… - arXiv preprint arXiv …, 2022 - arxiv.org
Vision transformers (ViTs) are usually considered to be less light-weight than convolutional
neural networks (CNNs) due to the lack of inductive bias. Recent works thus resort to …

DMFormer: Closing the gap Between CNN and Vision Transformers

Z Wei, H Pan, L Li, M Lu, X Niu… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Vision transformers have shown excellent performance in computer vision tasks. As the
computation cost of their self-attention mechanism is expensive, recent works tried to …

SpectFormer: Frequency and Attention is what you need in a Vision Transformer

BN Patro, VP Namboodiri, VS Agneeswaran - arXiv preprint arXiv …, 2023 - arxiv.org
Vision transformers have been applied successfully for image recognition tasks. There have
been either multi-headed self-attention based (ViT\cite {dosovitskiy2020image}, DeIT,\cite …

Transmix: Attend to mix for vision transformers

JN Chen, S Sun, J He, PHS Torr… - Proceedings of the …, 2022 - openaccess.thecvf.com
Mixup-based augmentation has been found to be effective for generalizing models during
training, especially for Vision Transformers (ViTs) since they can easily overfit. However …

Scalable vision transformers with hierarchical pooling

Z Pan, B Zhuang, J Liu, H He… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
The recently proposed Visual image Transformers (ViT) with pure attention have achieved
promising performance on image recognition tasks, such as image classification. However …

Ppt: Token pruning and pooling for efficient vision transformers

X Wu, F Zeng, X Wang, Y Wang, X Chen - arXiv preprint arXiv:2310.01812, 2023 - arxiv.org
Vision Transformers (ViTs) have emerged as powerful models in the field of computer vision,
delivering superior performance across various vision tasks. However, the high …

Vision transformers with patch diversification

C Gong, D Wang, M Li, V Chandra, Q Liu - arXiv preprint arXiv:2104.12753, 2021 - arxiv.org
Vision transformer has demonstrated promising performance on challenging computer
vision tasks. However, directly training the vision transformers may yield unstable and sub …

Less is more: Pay less attention in vision transformers

Z Pan, B Zhuang, H He, J Liu, J Cai - … of the AAAI Conference on Artificial …, 2022 - ojs.aaai.org
Transformers have become one of the dominant architectures in deep learning, particularly
as a powerful alternative to convolutional neural networks (CNNs) in computer vision …