- 学术资源搜索

Transformers in vision: A survey

S Khan, M Naseer, M Hayat, SW Zamir… - ACM computing …, 2022 - dl.acm.org

Astounding results from Transformer models on natural language tasks have intrigued the
vision community to study their application to computer vision problems. Among their salient …

被引用次数：2767 相关文章所有 8 个版本

Vision transformers for dense prediction: A survey

S Zuo, Y Xiao, X Chang, X Wang - Knowledge-Based Systems, 2022 - Elsevier

Transformers have demonstrated impressive expressiveness and transfer capability in
computer vision fields. Dense prediction is a fundamental problem in computer vision that is …

被引用次数：48 相关文章所有 3 个版本

[PDF] thecvf.com

Large selective kernel network for remote sensing object detection

Y Li, Q Hou, Z Zheng, MM Cheng… - Proceedings of the …, 2023 - openaccess.thecvf.com

Recent research on remote sensing object detection has largely focused on improving the
representation of oriented bounding boxes but has overlooked the unique prior knowledge …

被引用次数：296 相关文章所有 7 个版本

[PDF] arxiv.org

Vision transformer adapter for dense predictions

Z Chen, Y Duan, W Wang, J He, T Lu, J Dai… - arXiv preprint arXiv …, 2022 - arxiv.org

This work investigates a simple yet powerful adapter for Vision Transformer (ViT). Unlike
recent visual transformers that introduce vision-specific inductive biases into their …

被引用次数：567 相关文章所有 3 个版本

[PDF] springer.com

Visual attention network

MH Guo, CZ Lu, ZN Liu, MM Cheng, SM Hu - Computational Visual Media, 2023 - Springer

While originally designed for natural language processing tasks, the self-attention
mechanism has recently taken various computer vision areas by storm. However, the 2D …

被引用次数：739 相关文章所有 8 个版本

[PDF] neurips.cc

Segvit: Semantic segmentation with plain vision transformers

B Zhang, Z Tian, Q Tang, X Chu… - Advances in Neural …, 2022 - proceedings.neurips.cc

We explore the capability of plain Vision Transformers (ViTs) for semantic segmentation and
propose the SegViT. Previous ViT-based segmentation networks usually learn a pixel-level …

被引用次数：122 相关文章所有 6 个版本

[PDF] thecvf.com

Delivering arbitrary-modal semantic segmentation

J Zhang, R Liu, H Shi, K Yang, S Reiß… - Proceedings of the …, 2023 - openaccess.thecvf.com

Multimodal fusion can make semantic segmentation more robust. However, fusing an
arbitrary number of modalities remains underexplored. To delve into this problem, we create …

被引用次数：92 相关文章所有 7 个版本

[PDF] thecvf.com

Focal network for image restoration

Y Cui, W Ren, X Cao, A Knoll - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Image restoration aims to reconstruct a sharp image from its degraded counterpart, which
plays an important role in many fields. Recently, Transformer models have achieved …

被引用次数：71 相关文章所有 5 个版本

[PDF] arxiv.org

Jcs: An explainable covid-19 diagnosis system by joint classification and segmentation

YH Wu, SH Gao, J Mei, J Xu, DP Fan… - … on Image Processing, 2021 - ieeexplore.ieee.org

Recently, the coronavirus disease 2019 (COVID-19) has caused a pandemic disease in
over 200 countries, influencing billions of humans. To control the infection, identifying and …

被引用次数：519 相关文章所有 10 个版本

[PDF] arxiv.org

Centralized feature pyramid for object detection

Y Quan, D Zhang, L Zhang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

The visual feature pyramid has shown its superiority in both effectiveness and efficiency in a
variety of applications. However, current methods overly focus on inter-layer feature …

被引用次数：163 相关文章所有 6 个版本