Refiner: Refining self-attention for vision transformers

S Jamil, M Jalil Piran, OJ Kwon - Drones, 2023 - mdpi.com

As a special type of transformer, vision transformers (ViTs) can be used for various computer
vision (CV) applications. Convolutional neural networks (CNNs) have several potential …

被引用次数：34 相关文章所有 8 个版本

[HTML] mdpi.com

[HTML][HTML] Large language models in healthcare and medical domain: A review

ZA Nazi, W Peng - Informatics, 2024 - mdpi.com

The deployment of large language models (LLMs) within the healthcare sector has sparked
both enthusiasm and apprehension. These models exhibit the remarkable ability to provide …

被引用次数：15 相关文章所有 3 个版本

[PDF] thecvf.com

Metaformer is actually what you need for vision

W Yu, M Luo, P Zhou, C Si, Y Zhou… - Proceedings of the …, 2022 - openaccess.thecvf.com

Transformers have shown great potential in computer vision tasks. A common belief is their
attention-based token mixer module contributes most to their competence. However, recent …

被引用次数：909 相关文章所有 8 个版本

[PDF] mlr.press

Understanding the robustness in vision transformers

D Zhou, Z Yu, E Xie, C Xiao… - International …, 2022 - proceedings.mlr.press

Recent studies show that Vision Transformers (ViTs) exhibit strong robustness against
various corruptions. Although this property is partly attributed to the self-attention …

被引用次数：179 相关文章所有 6 个版本

[PDF] mdpi.com

A survey of visual transformers

Y Liu, Y Zhang, Y Wang, F Hou, J Yuan… - … on Neural Networks …, 2023 - ieeexplore.ieee.org

Transformer, an attention-based encoder–decoder model, has already revolutionized the
field of natural language processing (NLP). Inspired by such significant achievements, some …

被引用次数：353 相关文章所有 22 个版本

[PDF] neurips.cc

Focal modulation networks

J Yang, C Li, X Dai, J Gao - Advances in Neural Information …, 2022 - proceedings.neurips.cc

We propose focal modulation networks (FocalNets in short), where self-attention (SA) is
completely replaced by a focal modulation module for modeling token interactions in vision …

被引用次数：204 相关文章所有 6 个版本

[PDF] baai.ac.cn

A survey on vision transformer

K Han, Y Wang, H Chen, X Chen, J Guo… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …

被引用次数：2027 相关文章所有 7 个版本

[PDF] arxiv.org

Volo: Vision outlooker for visual recognition

L Yuan, Q Hou, Z Jiang, J Feng… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Recently, Vision Transformers (ViTs) have been broadly explored in visual recognition. With
low efficiency in encoding fine-level features, the performance of ViTs is still inferior to the …

被引用次数：317 相关文章所有 7 个版本

[PDF] arxiv.org

Cdtrans: Cross-domain transformer for unsupervised domain adaptation

T Xu, W Chen, P Wang, F Wang, H Li, R Jin - arXiv preprint arXiv …, 2021 - arxiv.org

Unsupervised domain adaptation (UDA) aims to transfer knowledge learned from a labeled
source domain to a different unlabeled target domain. Most existing UDA methods focus on …

被引用次数：241 相关文章所有 5 个版本

[PDF] arxiv.org

A survey on visual transformer

K Han, Y Wang, H Chen, X Chen, J Guo, Z Liu… - arXiv preprint arXiv …, 2020 - arxiv.org

Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …

被引用次数：363 相关文章所有 3 个版本