A comprehensive survey of transformers for computer vision

S Jamil, M Jalil Piran, OJ Kwon - Drones, 2023 - mdpi.com
As a special type of transformer, vision transformers (ViTs) can be used for various computer
vision (CV) applications. Convolutional neural networks (CNNs) have several potential …

[HTML][HTML] Large language models in healthcare and medical domain: A review

ZA Nazi, W Peng - Informatics, 2024 - mdpi.com
The deployment of large language models (LLMs) within the healthcare sector has sparked
both enthusiasm and apprehension. These models exhibit the remarkable ability to provide …

Metaformer is actually what you need for vision

W Yu, M Luo, P Zhou, C Si, Y Zhou… - Proceedings of the …, 2022 - openaccess.thecvf.com
Transformers have shown great potential in computer vision tasks. A common belief is their
attention-based token mixer module contributes most to their competence. However, recent …

Understanding the robustness in vision transformers

D Zhou, Z Yu, E Xie, C Xiao… - International …, 2022 - proceedings.mlr.press
Recent studies show that Vision Transformers (ViTs) exhibit strong robustness against
various corruptions. Although this property is partly attributed to the self-attention …

A survey of visual transformers

Y Liu, Y Zhang, Y Wang, F Hou, J Yuan… - … on Neural Networks …, 2023 - ieeexplore.ieee.org
Transformer, an attention-based encoder–decoder model, has already revolutionized the
field of natural language processing (NLP). Inspired by such significant achievements, some …

Focal modulation networks

J Yang, C Li, X Dai, J Gao - Advances in Neural Information …, 2022 - proceedings.neurips.cc
We propose focal modulation networks (FocalNets in short), where self-attention (SA) is
completely replaced by a focal modulation module for modeling token interactions in vision …

A survey on vision transformer

K Han, Y Wang, H Chen, X Chen, J Guo… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …

Volo: Vision outlooker for visual recognition

L Yuan, Q Hou, Z Jiang, J Feng… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Recently, Vision Transformers (ViTs) have been broadly explored in visual recognition. With
low efficiency in encoding fine-level features, the performance of ViTs is still inferior to the …

Cdtrans: Cross-domain transformer for unsupervised domain adaptation

T Xu, W Chen, P Wang, F Wang, H Li, R Jin - arXiv preprint arXiv …, 2021 - arxiv.org
Unsupervised domain adaptation (UDA) aims to transfer knowledge learned from a labeled
source domain to a different unlabeled target domain. Most existing UDA methods focus on …

A survey on visual transformer

K Han, Y Wang, H Chen, X Chen, J Guo, Z Liu… - arXiv preprint arXiv …, 2020 - arxiv.org
Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …