相关文章- 学术资源搜索

Scattering vision transformer: Spectral mixing matters

B Patro, V Agneeswaran - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Vision transformers have gained significant attention and achieved state-of-the-art
performance in various computer vision tasks, including image classification, instance …

被引用次数：7 相关文章所有 7 个版本

[PDF] thecvf.com

You Only Need Less Attention at Each Stage in Vision Transformers

S Zhang, H Liu, S Lin, K He - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com

Abstract The advent of Vision Transformers (ViTs) marks a substantial paradigm shift in the
realm of computer vision. ViTs capture the global information of images through self …

Lightvit: Towards light-weight convolution-free vision transformers

T Huang, L Huang, S You, F Wang, C Qian… - arXiv preprint arXiv …, 2022 - arxiv.org

Vision transformers (ViTs) are usually considered to be less light-weight than convolutional
neural networks (CNNs) due to the lack of inductive bias. Recent works thus resort to …

被引用次数：58 相关文章所有 3 个版本

[PDF] arxiv.org

DMFormer: Closing the gap Between CNN and Vision Transformers

Z Wei, H Pan, L Li, M Lu, X Niu… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Vision transformers have shown excellent performance in computer vision tasks. As the
computation cost of their self-attention mechanism is expensive, recent works tried to …

被引用次数：17 相关文章所有 4 个版本

[PDF] arxiv.org

SpectFormer: Frequency and Attention is what you need in a Vision Transformer

BN Patro, VP Namboodiri, VS Agneeswaran - arXiv preprint arXiv …, 2023 - arxiv.org

Vision transformers have been applied successfully for image recognition tasks. There have
been either multi-headed self-attention based (ViT\cite {dosovitskiy2020image}, DeIT,\cite …

被引用次数：30 相关文章所有 3 个版本

[PDF] thecvf.com

Transmix: Attend to mix for vision transformers

JN Chen, S Sun, J He, PHS Torr… - Proceedings of the …, 2022 - openaccess.thecvf.com

Mixup-based augmentation has been found to be effective for generalizing models during
training, especially for Vision Transformers (ViTs) since they can easily overfit. However …

被引用次数：98 相关文章所有 7 个版本

[PDF] thecvf.com

Scalable vision transformers with hierarchical pooling

Z Pan, B Zhuang, J Liu, H He… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

The recently proposed Visual image Transformers (ViT) with pure attention have achieved
promising performance on image recognition tasks, such as image classification. However …

被引用次数：152 相关文章所有 6 个版本

[PDF] arxiv.org

Ppt: Token pruning and pooling for efficient vision transformers

X Wu, F Zeng, X Wang, Y Wang, X Chen - arXiv preprint arXiv:2310.01812, 2023 - arxiv.org

Vision Transformers (ViTs) have emerged as powerful models in the field of computer vision,
delivering superior performance across various vision tasks. However, the high …

被引用次数：9 相关文章所有 3 个版本

[PDF] arxiv.org

Vision transformers with patch diversification

C Gong, D Wang, M Li, V Chandra, Q Liu - arXiv preprint arXiv:2104.12753, 2021 - arxiv.org

Vision transformer has demonstrated promising performance on challenging computer
vision tasks. However, directly training the vision transformers may yield unstable and sub …

被引用次数：57 相关文章所有 2 个版本

[PDF] aaai.org

Less is more: Pay less attention in vision transformers

Z Pan, B Zhuang, H He, J Liu, J Cai - … of the AAAI Conference on Artificial …, 2022 - ojs.aaai.org

Transformers have become one of the dominant architectures in deep learning, particularly
as a powerful alternative to convolutional neural networks (CNNs) in computer vision …

被引用次数：72 相关文章所有 6 个版本

Scattering vision transformer: Spectral mixing matters

You Only Need Less Attention at Each Stage in Vision Transformers

Lightvit: Towards light-weight convolution-free vision transformers

DMFormer: Closing the gap Between CNN and Vision Transformers

SpectFormer: Frequency and Attention is what you need in a Vision Transformer

Transmix: Attend to mix for vision transformers

Scalable vision transformers with hierarchical pooling

Ppt: Token pruning and pooling for efficient vision transformers

Vision transformers with patch diversification

Less is more: Pay less attention in vision transformers

相关搜索

高级搜索

引用