Disentangled non-local neural networks

F Shamshad, S Khan, SW Zamir, MH Khan… - Medical Image …, 2023 - Elsevier

Following unprecedented success on the natural language tasks, Transformers have been
successfully applied to several computer vision problems, achieving state-of-the-art results …

被引用次数：606 相关文章所有 9 个版本

[PDF] springer.com

Attention mechanisms in computer vision: A survey

MH Guo, TX Xu, JJ Liu, ZN Liu, PT Jiang, TJ Mu… - Computational visual …, 2022 - Springer

Humans can naturally and effectively find salient regions in complex scenes. Motivated by
this observation, attention mechanisms were introduced into computer vision with the aim of …

被引用次数：1621 相关文章所有 8 个版本

[PDF] arxiv.org

Designing network design strategies through gradient path analysis

CY Wang, HYM Liao, IH Yeh - arXiv preprint arXiv:2211.04800, 2022 - arxiv.org

Designing a high-efficiency and high-quality expressive network architecture has always
been the most important research topic in the field of deep learning. Most of today's network …

被引用次数：247 相关文章所有 6 个版本

[PDF] thecvf.com

Denseclip: Language-guided dense prediction with context-aware prompting

Y Rao, W Zhao, G Chen, Y Tang… - Proceedings of the …, 2022 - openaccess.thecvf.com

Recent progress has shown that large-scale pre-training using contrastive image-text pairs
can be a promising alternative for high-quality visual representation learning from natural …

被引用次数：524 相关文章所有 5 个版本

[PDF] thecvf.com

Video swin transformer

Z Liu, J Ning, Y Cao, Y Wei, Z Zhang… - Proceedings of the …, 2022 - openaccess.thecvf.com

The vision community is witnessing a modeling shift from CNNs to Transformers, where pure
Transformer architectures have attained top accuracy on the major video recognition …

被引用次数：1773 相关文章所有 8 个版本

[PDF] thecvf.com

Segmenter: Transformer for semantic segmentation

R Strudel, R Garcia, I Laptev… - Proceedings of the …, 2021 - openaccess.thecvf.com

Image segmentation is often ambiguous at the level of individual image patches and
requires contextual information to reach label consensus. In this paper we introduce …

被引用次数：1805 相关文章所有 12 个版本

[PDF] arxiv.org

Contextual transformer networks for visual recognition

Y Li, T Yao, Y Pan, T Mei - IEEE Transactions on Pattern …, 2022 - ieeexplore.ieee.org

Transformer with self-attention has led to the revolutionizing of natural language processing
field, and recently inspires the emergence of Transformer-style architecture design with …

被引用次数：507 相关文章所有 7 个版本

[PDF] arxiv.org

You only learn one representation: Unified network for multiple tasks

CY Wang, IH Yeh, HYM Liao - arXiv preprint arXiv:2105.04206, 2021 - arxiv.org

People``understand''the world via vision, hearing, tactile, and also the past experience.
Human experience can be learned through normal learning (we call it explicit knowledge) …

被引用次数：670 相关文章所有 7 个版本

[PDF] thecvf.com

Swin transformer: Hierarchical vision transformer using shifted windows

Z Liu, Y Lin, Y Cao, H Hu, Y Wei… - Proceedings of the …, 2021 - openaccess.thecvf.com

This paper presents a new vision Transformer, called Swin Transformer, that capably serves
as a general-purpose backbone for computer vision. Challenges in adapting Transformer …

被引用次数：22971 相关文章所有 9 个版本

[PDF] arxiv.org

Focal self-attention for local-global interactions in vision transformers

J Yang, C Li, P Zhang, X Dai, B Xiao, L Yuan… - arXiv preprint arXiv …, 2021 - arxiv.org

Recently, Vision Transformer and its variants have shown great promise on various
computer vision tasks. The ability of capturing short-and long-range visual dependencies …

被引用次数：460 相关文章所有 2 个版本