Transformers in vision: A survey

S Khan, M Naseer, M Hayat, SW Zamir… - ACM computing …, 2022 - dl.acm.org
Astounding results from Transformer models on natural language tasks have intrigued the
vision community to study their application to computer vision problems. Among their salient …

Vision transformers for dense prediction: A survey

S Zuo, Y Xiao, X Chang, X Wang - Knowledge-Based Systems, 2022 - Elsevier
Transformers have demonstrated impressive expressiveness and transfer capability in
computer vision fields. Dense prediction is a fundamental problem in computer vision that is …

Large selective kernel network for remote sensing object detection

Y Li, Q Hou, Z Zheng, MM Cheng… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent research on remote sensing object detection has largely focused on improving the
representation of oriented bounding boxes but has overlooked the unique prior knowledge …

Vision transformer adapter for dense predictions

Z Chen, Y Duan, W Wang, J He, T Lu, J Dai… - arXiv preprint arXiv …, 2022 - arxiv.org
This work investigates a simple yet powerful adapter for Vision Transformer (ViT). Unlike
recent visual transformers that introduce vision-specific inductive biases into their …

Visual attention network

MH Guo, CZ Lu, ZN Liu, MM Cheng, SM Hu - Computational Visual Media, 2023 - Springer
While originally designed for natural language processing tasks, the self-attention
mechanism has recently taken various computer vision areas by storm. However, the 2D …

Segvit: Semantic segmentation with plain vision transformers

B Zhang, Z Tian, Q Tang, X Chu… - Advances in Neural …, 2022 - proceedings.neurips.cc
We explore the capability of plain Vision Transformers (ViTs) for semantic segmentation and
propose the SegViT. Previous ViT-based segmentation networks usually learn a pixel-level …

Delivering arbitrary-modal semantic segmentation

J Zhang, R Liu, H Shi, K Yang, S Reiß… - Proceedings of the …, 2023 - openaccess.thecvf.com
Multimodal fusion can make semantic segmentation more robust. However, fusing an
arbitrary number of modalities remains underexplored. To delve into this problem, we create …

Focal network for image restoration

Y Cui, W Ren, X Cao, A Knoll - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Image restoration aims to reconstruct a sharp image from its degraded counterpart, which
plays an important role in many fields. Recently, Transformer models have achieved …

Jcs: An explainable covid-19 diagnosis system by joint classification and segmentation

YH Wu, SH Gao, J Mei, J Xu, DP Fan… - … on Image Processing, 2021 - ieeexplore.ieee.org
Recently, the coronavirus disease 2019 (COVID-19) has caused a pandemic disease in
over 200 countries, influencing billions of humans. To control the infection, identifying and …

Centralized feature pyramid for object detection

Y Quan, D Zhang, L Zhang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
The visual feature pyramid has shown its superiority in both effectiveness and efficiency in a
variety of applications. However, current methods overly focus on inter-layer feature …