Neighborhood attention transformer

J Li, Z Yu, Z Du, L Zhu, HT Shen - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org

Over the past decade, domain adaptation has become a widely studied branch of transfer
learning which aims to improve performance on target domains by leveraging knowledge …

被引用次数：78 相关文章所有 6 个版本

[PDF] nowpublishers.com

Semantic image segmentation: Two decades of research

G Csurka, R Volpi, B Chidlovskii - Foundations and Trends® …, 2022 - nowpublishers.com

Semantic image segmentation (SiS) plays a fundamental role in a broad variety of computer
vision applications, providing key information for the global understanding of an image. This …

被引用次数：44 相关文章所有 7 个版本

[PDF] thecvf.com

Oneformer: One transformer to rule universal image segmentation

J Jain, J Li, MT Chiu, A Hassani… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract Universal Image Segmentation is not a new concept. Past attempts to unify image
segmentation include scene parsing, panoptic segmentation, and, more recently, new …

被引用次数：336 相关文章所有 8 个版本

[PDF] thecvf.com

Flatten transformer: Vision transformer using focused linear attention

D Han, X Pan, Y Han, S Song… - Proceedings of the …, 2023 - openaccess.thecvf.com

The quadratic computation complexity of self-attention has been a persistent challenge
when applying Transformer models to vision tasks. Linear attention, on the other hand, offers …

被引用次数：152 相关文章所有 5 个版本

[PDF] arxiv.org

Agent attention: On the integration of softmax and linear attention

D Han, T Ye, Y Han, Z Xia, S Pan, P Wan… - … on Computer Vision, 2025 - Springer

The attention module is the key component in Transformers. While the global attention
mechanism offers high expressiveness, its excessive computational cost restricts its …

被引用次数：58 相关文章所有 2 个版本

[PDF] arxiv.org

Conv2former: A simple transformer-style convnet for visual recognition

Q Hou, CZ Lu, MM Cheng… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Vision Transformers have been the most popular network architecture in visual recognition
recently due to the strong ability of encode global information. However, its high …

被引用次数：130 相关文章所有 7 个版本

[PDF] thecvf.com

Matting anything

J Li, J Jain, H Shi - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

In this paper we propose the Matting Anything Model (MAM) an efficient and versatile
framework for estimating the alpha matte of any instance in an image with flexible and …

被引用次数：61 相关文章所有 5 个版本

[PDF] arxiv.org

Metaformer baselines for vision

W Yu, C Si, P Zhou, M Luo, Y Zhou… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org

MetaFormer, the abstracted architecture of Transformer, has been found to play a significant
role in achieving competitive performance. In this paper, we further explore the capacity of …

被引用次数：149 相关文章所有 9 个版本

[PDF] thecvf.com

Rmt: Retentive networks meet vision transformers

Q Fan, H Huang, M Chen, H Liu… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Abstract Vision Transformer (ViT) has gained increasing attention in the computer vision
community in recent years. However the core component of ViT Self-Attention lacks explicit …

被引用次数：53 相关文章所有 3 个版本

[PDF] thecvf.com

Slide-transformer: Hierarchical vision transformer with local self-attention

X Pan, T Ye, Z Xia, S Song… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Self-attention mechanism has been a key factor in the recent progress of Vision Transformer
(ViT), which enables adaptive feature extraction from global contexts. However, existing self …

被引用次数：61 相关文章所有 6 个版本