相关文章- 学术资源搜索

A survey on visual transformer

K Han, Y Wang, H Chen, X Chen, J Guo, Z Liu… - arXiv preprint arXiv …, 2020 - arxiv.org

Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …

被引用次数：353 相关文章所有 3 个版本

[PDF] baai.ac.cn

A survey on vision transformer

K Han, Y Wang, H Chen, X Chen, J Guo… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …

被引用次数：1656 相关文章所有 7 个版本

[PDF] thecvf.com

Incorporating convolution designs into visual transformers

K Yuan, S Guo, Z Liu, A Zhou… - Proceedings of the …, 2021 - openaccess.thecvf.com

Motivated by the success of Transformers in natural language processing (NLP) tasks, there
exist some attempts (eg, ViT and DeiT) to apply Transformers to the vision domain. However …

被引用次数：495 相关文章所有 6 个版本

[PDF] arxiv.org

Transformers in vision: A survey

S Khan, M Naseer, M Hayat, SW Zamir… - ACM computing …, 2022 - dl.acm.org

Astounding results from Transformer models on natural language tasks have intrigued the
vision community to study their application to computer vision problems. Among their salient …

被引用次数：2174 相关文章所有 8 个版本

[PDF] arxiv.org

Three things everyone should know about vision transformers

H Touvron, M Cord, A El-Nouby, J Verbeek… - European Conference on …, 2022 - Springer

After their initial success in natural language processing, transformer architectures have
rapidly gained traction in computer vision, providing state-of-the-art results for tasks such as …

被引用次数：84 相关文章所有 6 个版本

[PDF] thecvf.com

Rethinking spatial dimensions of vision transformers

B Heo, S Yun, D Han, S Chun… - Proceedings of the …, 2021 - openaccess.thecvf.com

Abstract Vision Transformer (ViT) extends the application range of transformers from
language processing to computer vision tasks as being an alternative architecture against …

被引用次数：569 相关文章所有 8 个版本

[PDF] arxiv.org

SpectFormer: Frequency and Attention is what you need in a Vision Transformer

BN Patro, VP Namboodiri, VS Agneeswaran - arXiv preprint arXiv …, 2023 - arxiv.org

Vision transformers have been applied successfully for image recognition tasks. There have
been either multi-headed self-attention based (ViT\cite {dosovitskiy2020image}, DeIT,\cite …

被引用次数：17 相关文章所有 3 个版本

[PDF] thecvf.com

Tokens-to-token vit: Training vision transformers from scratch on imagenet

L Yuan, Y Chen, T Wang, W Yu, Y Shi… - Proceedings of the …, 2021 - openaccess.thecvf.com

Transformers, which are popular for language modeling, have been explored for solving
vision tasks recently, eg, the Vision Transformer (ViT) for image classification. The ViT model …

被引用次数：1969 相关文章所有 7 个版本

[PDF] arxiv.org

Regionvit: Regional-to-local attention for vision transformers

CF Chen, R Panda, Q Fan - arXiv preprint arXiv:2106.02689, 2021 - arxiv.org

Vision transformer (ViT) has recently shown its strong capability in achieving comparable
results to convolutional neural networks (CNNs) on image classification. However, vanilla …

被引用次数：178 相关文章所有 5 个版本

[PDF] mdpi.com

A survey of visual transformers

Y Liu, Y Zhang, Y Wang, F Hou, J Yuan… - … on Neural Networks …, 2023 - ieeexplore.ieee.org

Transformer, an attention-based encoder–decoder model, has already revolutionized the
field of natural language processing (NLP). Inspired by such significant achievements, some …

被引用次数：290 相关文章所有 22 个版本