相关文章- 学术资源搜索

Transformers in vision: A survey

S Khan, M Naseer, M Hayat, SW Zamir… - ACM computing …, 2022 - dl.acm.org

Astounding results from Transformer models on natural language tasks have intrigued the
vision community to study their application to computer vision problems. Among their salient …

被引用次数：2283 相关文章所有 8 个版本

[PDF] baai.ac.cn

A survey on vision transformer

K Han, Y Wang, H Chen, X Chen, J Guo… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …

被引用次数：1776 相关文章所有 7 个版本

[PDF] arxiv.org

A survey on visual transformer

K Han, Y Wang, H Chen, X Chen, J Guo, Z Liu… - arXiv preprint arXiv …, 2020 - arxiv.org

Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …

被引用次数：355 相关文章所有 3 个版本

[PDF] thecvf.com

Tokens-to-token vit: Training vision transformers from scratch on imagenet

L Yuan, Y Chen, T Wang, W Yu, Y Shi… - Proceedings of the …, 2021 - openaccess.thecvf.com

Transformers, which are popular for language modeling, have been explored for solving
vision tasks recently, eg, the Vision Transformer (ViT) for image classification. The ViT model …

被引用次数：2042 相关文章所有 7 个版本

[HTML] springer.com Full View

[HTML][HTML] Transformers in computational visual media: A survey

Y Xu, H Wei, M Lin, Y Deng, K Sheng, M Zhang… - Computational Visual …, 2022 - Springer

Transformers, the dominant architecture for natural language processing, have also recently
attracted much attention from computational visual media researchers due to their capacity …

被引用次数：106 相关文章所有 6 个版本

[PDF] neurips.cc

Long-short transformer: Efficient transformers for language and vision

C Zhu, W Ping, C Xiao, M Shoeybi… - Advances in neural …, 2021 - proceedings.neurips.cc

Transformers have achieved success in both language and vision domains. However, it is
prohibitively expensive to scale them to long sequences such as long documents or high …

被引用次数：118 相关文章所有 8 个版本

[PDF] thecvf.com

Incorporating convolution designs into visual transformers

K Yuan, S Guo, Z Liu, A Zhou… - Proceedings of the …, 2021 - openaccess.thecvf.com

Motivated by the success of Transformers in natural language processing (NLP) tasks, there
exist some attempts (eg, ViT and DeiT) to apply Transformers to the vision domain. However …

被引用次数：508 相关文章所有 6 个版本

[PDF] arxiv.org

Learning to merge tokens in vision transformers

C Renggli, AS Pinto, N Houlsby, B Mustafa… - arXiv preprint arXiv …, 2022 - arxiv.org

Transformers are widely applied to solve natural language understanding and computer
vision tasks. While scaling up these architectures leads to improved performance, it often …

被引用次数：43 相关文章所有 2 个版本

[PDF] neurips.cc

Pay attention to mlps

H Liu, Z Dai, D So, QV Le - Advances in neural information …, 2021 - proceedings.neurips.cc

Transformers have become one of the most important architectural innovations in deep
learning and have enabled many breakthroughs over the past few years. Here we propose a …

被引用次数：553 相关文章所有 7 个版本

[PDF] arxiv.org

Localvit: Bringing locality to vision transformers

Y Li, K Zhang, J Cao, R Timofte, L Van Gool - arXiv preprint arXiv …, 2021 - arxiv.org

We study how to introduce locality mechanisms into vision transformers. The transformer
network originates from machine translation and is particularly good at modelling long-range …

被引用次数：484 相关文章所有 3 个版本

Transformers in vision: A survey

A survey on vision transformer

A survey on visual transformer

Tokens-to-token vit: Training vision transformers from scratch on imagenet

[HTML][HTML] Transformers in computational visual media: A survey

Long-short transformer: Efficient transformers for language and vision

Incorporating convolution designs into visual transformers

Learning to merge tokens in vision transformers

Pay attention to mlps

Localvit: Bringing locality to vision transformers

相关搜索

高级搜索

引用