Transformers in vision: A survey

S Khan, M Naseer, M Hayat, SW Zamir… - ACM computing …, 2022 - dl.acm.org
Astounding results from Transformer models on natural language tasks have intrigued the
vision community to study their application to computer vision problems. Among their salient …

Are we ready for a new paradigm shift? a survey on visual deep mlp

R Liu, Y Li, L Tao, D Liang, HT Zheng - Patterns, 2022 - cell.com
Recently, the proposed deep multilayer perceptron (MLP) models have stirred up a lot of
interest in the vision community. Historically, the availability of larger datasets combined with …

Scaling vision transformers to gigapixel images via hierarchical self-supervised learning

RJ Chen, C Chen, Y Li, TY Chen… - Proceedings of the …, 2022 - openaccess.thecvf.com
Abstract Vision Transformers (ViTs) and their multi-scale and hierarchical variations have
been successful at capturing image representations but their use has been generally …

Learning to prompt for continual learning

Z Wang, Z Zhang, CY Lee, H Zhang… - Proceedings of the …, 2022 - openaccess.thecvf.com
The mainstream paradigm behind continual learning has been to adapt the model
parameters to non-stationary data distributions, where catastrophic forgetting is the central …

Compute trends across three eras of machine learning

J Sevilla, L Heim, A Ho, T Besiroglu… - … Joint Conference on …, 2022 - ieeexplore.ieee.org
Compute, data, and algorithmic advances are the three fundamental factors that drive
progress in modern Machine Learning (ML). In this paper we study trends in the most readily …

Uformer: A general u-shaped transformer for image restoration

Z Wang, X Cun, J Bao, W Zhou… - Proceedings of the …, 2022 - openaccess.thecvf.com
In this paper, we present Uformer, an effective and efficient Transformer-based architecture
for image restoration, in which we build a hierarchical encoder-decoder network using the …

A generalist framework for panoptic segmentation of images and videos

T Chen, L Li, S Saxena, G Hinton… - Proceedings of the …, 2023 - openaccess.thecvf.com
Panoptic segmentation assigns semantic and instance ID labels to every pixel of an image.
As permutations of instance IDs are also valid solutions, the task requires learning of high …

A survey on vision transformer

K Han, Y Wang, H Chen, X Chen, J Guo… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …

Hyperspectral image transformer classification networks

X Yang, W Cao, Y Lu, Y Zhou - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Hyperspectral image (HSI) classification is an important task in earth observation missions.
Convolution neural networks (CNNs) with the powerful ability of feature extraction have …

Crossformer++: A versatile vision transformer hinging on cross-scale attention

W Wang, W Chen, Q Qiu, L Chen, B Wu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
While features of different scales are perceptually important to visual inputs, existing vision
transformers do not yet take advantage of them explicitly. To this end, we first propose a …