Transformers in medical imaging: A survey

F Shamshad, S Khan, SW Zamir, MH Khan… - Medical Image …, 2023 - Elsevier
Following unprecedented success on the natural language tasks, Transformers have been
successfully applied to several computer vision problems, achieving state-of-the-art results …

Attention mechanisms in computer vision: A survey

MH Guo, TX Xu, JJ Liu, ZN Liu, PT Jiang, TJ Mu… - Computational visual …, 2022 - Springer
Humans can naturally and effectively find salient regions in complex scenes. Motivated by
this observation, attention mechanisms were introduced into computer vision with the aim of …

Designing network design strategies through gradient path analysis

CY Wang, HYM Liao, IH Yeh - arXiv preprint arXiv:2211.04800, 2022 - arxiv.org
Designing a high-efficiency and high-quality expressive network architecture has always
been the most important research topic in the field of deep learning. Most of today's network …

Denseclip: Language-guided dense prediction with context-aware prompting

Y Rao, W Zhao, G Chen, Y Tang… - Proceedings of the …, 2022 - openaccess.thecvf.com
Recent progress has shown that large-scale pre-training using contrastive image-text pairs
can be a promising alternative for high-quality visual representation learning from natural …

Video swin transformer

Z Liu, J Ning, Y Cao, Y Wei, Z Zhang… - Proceedings of the …, 2022 - openaccess.thecvf.com
The vision community is witnessing a modeling shift from CNNs to Transformers, where pure
Transformer architectures have attained top accuracy on the major video recognition …

Segmenter: Transformer for semantic segmentation

R Strudel, R Garcia, I Laptev… - Proceedings of the …, 2021 - openaccess.thecvf.com
Image segmentation is often ambiguous at the level of individual image patches and
requires contextual information to reach label consensus. In this paper we introduce …

Contextual transformer networks for visual recognition

Y Li, T Yao, Y Pan, T Mei - IEEE Transactions on Pattern …, 2022 - ieeexplore.ieee.org
Transformer with self-attention has led to the revolutionizing of natural language processing
field, and recently inspires the emergence of Transformer-style architecture design with …

You only learn one representation: Unified network for multiple tasks

CY Wang, IH Yeh, HYM Liao - arXiv preprint arXiv:2105.04206, 2021 - arxiv.org
People``understand''the world via vision, hearing, tactile, and also the past experience.
Human experience can be learned through normal learning (we call it explicit knowledge) …

Swin transformer: Hierarchical vision transformer using shifted windows

Z Liu, Y Lin, Y Cao, H Hu, Y Wei… - Proceedings of the …, 2021 - openaccess.thecvf.com
This paper presents a new vision Transformer, called Swin Transformer, that capably serves
as a general-purpose backbone for computer vision. Challenges in adapting Transformer …

Focal self-attention for local-global interactions in vision transformers

J Yang, C Li, P Zhang, X Dai, B Xiao, L Yuan… - arXiv preprint arXiv …, 2021 - arxiv.org
Recently, Vision Transformer and its variants have shown great promise on various
computer vision tasks. The ability of capturing short-and long-range visual dependencies …