Vision transformers for dense prediction: A survey

S Zuo, Y Xiao, X Chang, X Wang - Knowledge-Based Systems, 2022 - Elsevier
Transformers have demonstrated impressive expressiveness and transfer capability in
computer vision fields. Dense prediction is a fundamental problem in computer vision that is …

Vitpose: Simple vision transformer baselines for human pose estimation

Y Xu, J Zhang, Q Zhang, D Tao - Advances in Neural …, 2022 - proceedings.neurips.cc
Although no specific domain knowledge is considered in the design, plain vision
transformers have shown excellent performance in visual recognition tasks. However, little …

Edgenext: efficiently amalgamated cnn-transformer architecture for mobile vision applications

M Maaz, A Shaker, H Cholakkal, S Khan… - European conference on …, 2022 - Springer
In the pursuit of achieving ever-increasing accuracy, large and complex neural networks are
usually developed. Such models demand high computational resources and therefore …

SwinBTS: A method for 3D multimodal brain tumor segmentation using swin transformer

Y Jiang, Y Zhang, X Lin, J Dong, T Cheng, J Liang - Brain sciences, 2022 - mdpi.com
Brain tumor semantic segmentation is a critical medical image processing work, which aids
clinicians in diagnosing patients and determining the extent of lesions. Convolutional neural …

Transformer meets remote sensing video detection and tracking: A comprehensive survey

L Jiao, X Zhang, X Liu, F Liu, S Yang… - IEEE Journal of …, 2023 - ieeexplore.ieee.org
Transformer has shown excellent performance in remote sensing field with long-range
modeling capabilities. Remote sensing video (RSV) moving object detection and tracking …

KVT: k-NN Attention for Boosting Vision Transformers

P Wang, X Wang, F Wang, M Lin, S Chang, H Li… - European conference on …, 2022 - Springer
Abstract Convolutional Neural Networks (CNNs) have dominated computer vision for years,
due to its ability in capturing locality and translation invariance. Recently, many vision …

Vtc-lfc: Vision transformer compression with low-frequency components

Z Wang, H Luo, P Wang, F Ding… - Advances in Neural …, 2022 - proceedings.neurips.cc
Abstract Although Vision transformers (ViTs) have recently dominated many vision tasks,
deploying ViT models on resource-limited devices remains a challenging problem. To …

Revitalizing convolutional network for image restoration

Y Cui, W Ren, X Cao, A Knoll - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org
Image restoration aims to reconstruct a high-quality image from its corrupted version, playing
essential roles in many scenarios. Recent years have witnessed a paradigm shift in image …

Making vision transformers efficient from a token sparsification view

S Chang, P Wang, M Lin, F Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
The quadratic computational complexity to the number of tokens limits the practical
applications of Vision Transformers (ViTs). Several works propose to prune redundant …

Vitpose++: Vision transformer for generic body pose estimation

Y Xu, J Zhang, Q Zhang, D Tao - IEEE Transactions on Pattern …, 2023 - ieeexplore.ieee.org
In this paper, we show the surprisingly good properties of plain vision transformers for body
pose estimation from various aspects, namely simplicity in model structure, scalability in …