相关文章- 学术资源搜索

Davit: Dual attention vision transformers

M Ding, B Xiao, N Codella, P Luo, J Wang… - European conference on …, 2022 - Springer

In this work, we introduce Dual Attention Vision Transformers (DaViT), a simple yet effective
vision transformer architecture that is able to capture global context while maintaining …

被引用次数：256 相关文章所有 5 个版本

[PDF] aaai.org

Pale transformer: A general vision transformer backbone with pale-shaped attention

S Wu, T Wu, H Tan, G Guo - Proceedings of the AAAI Conference on …, 2022 - ojs.aaai.org

Recently, Transformers have shown promising performance in various vision tasks. To
reduce the quadratic computation complexity caused by the global self-attention, various …

被引用次数：67 相关文章所有 6 个版本

[PDF] thecvf.com

Slide-transformer: Hierarchical vision transformer with local self-attention

X Pan, T Ye, Z Xia, S Song… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Self-attention mechanism has been a key factor in the recent progress of Vision Transformer
(ViT), which enables adaptive feature extraction from global contexts. However, existing self …

被引用次数：40 相关文章所有 6 个版本

[PDF] thecvf.com

Swin transformer: Hierarchical vision transformer using shifted windows

Z Liu, Y Lin, Y Cao, H Hu, Y Wei… - Proceedings of the …, 2021 - openaccess.thecvf.com

This paper presents a new vision Transformer, called Swin Transformer, that capably serves
as a general-purpose backbone for computer vision. Challenges in adapting Transformer …

被引用次数：21227 相关文章所有 9 个版本

[PDF] neurips.cc

Focal attention for long-range interactions in vision transformers

J Yang, C Li, P Zhang, X Dai, B Xiao… - Advances in Neural …, 2021 - proceedings.neurips.cc

Abstract Recently, Vision Transformer and its variants have shown great promise on various
computer vision tasks. The ability to capture local and global visual dependencies through …

被引用次数：135 相关文章所有 7 个版本

[PDF] thecvf.com

Flatten transformer: Vision transformer using focused linear attention

D Han, X Pan, Y Han, S Song… - Proceedings of the …, 2023 - openaccess.thecvf.com

The quadratic computation complexity of self-attention has been a persistent challenge
when applying Transformer models to vision tasks. Linear attention, on the other hand, offers …

被引用次数：98 相关文章所有 5 个版本

[PDF] thecvf.com

Cswin transformer: A general vision transformer backbone with cross-shaped windows

X Dong, J Bao, D Chen, W Zhang… - Proceedings of the …, 2022 - openaccess.thecvf.com

Abstract We present CSWin Transformer, an efficient and effective Transformer-based
backbone for general-purpose vision tasks. A challenging issue in Transformer design is …

被引用次数：981 相关文章所有 7 个版本

[PDF] arxiv.org

Regionvit: Regional-to-local attention for vision transformers

CF Chen, R Panda, Q Fan - arXiv preprint arXiv:2106.02689, 2021 - arxiv.org

Vision transformer (ViT) has recently shown its strong capability in achieving comparable
results to convolutional neural networks (CNNs) on image classification. However, vanilla …

被引用次数：196 相关文章所有 5 个版本

[PDF] thecvf.com

Vision transformer with deformable attention

Z Xia, X Pan, S Song, LE Li… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

Transformers have recently shown superior performances on various vision tasks. The large,
sometimes even global, receptive field endows Transformer models with higher …

被引用次数：473 相关文章所有 6 个版本

[PDF] arxiv.org

KVT: k-NN Attention for Boosting Vision Transformers

P Wang, X Wang, F Wang, M Lin, S Chang, H Li… - European conference on …, 2022 - Springer

Abstract Convolutional Neural Networks (CNNs) have dominated computer vision for years,
due to its ability in capturing locality and translation invariance. Recently, many vision …

被引用次数：98 相关文章所有 6 个版本

Davit: Dual attention vision transformers

Pale transformer: A general vision transformer backbone with pale-shaped attention

Slide-transformer: Hierarchical vision transformer with local self-attention

Swin transformer: Hierarchical vision transformer using shifted windows

Focal attention for long-range interactions in vision transformers

Flatten transformer: Vision transformer using focused linear attention

Cswin transformer: A general vision transformer backbone with cross-shaped windows

Regionvit: Regional-to-local attention for vision transformers

Vision transformer with deformable attention

KVT: k-NN Attention for Boosting Vision Transformers

相关搜索

高级搜索

引用