Transformers in vision: A survey
Astounding results from Transformer models on natural language tasks have intrigued the
vision community to study their application to computer vision problems. Among their salient …
vision community to study their application to computer vision problems. Among their salient …
Are we ready for a new paradigm shift? a survey on visual deep mlp
Recently, the proposed deep multilayer perceptron (MLP) models have stirred up a lot of
interest in the vision community. Historically, the availability of larger datasets combined with …
interest in the vision community. Historically, the availability of larger datasets combined with …
Scaling vision transformers to gigapixel images via hierarchical self-supervised learning
Abstract Vision Transformers (ViTs) and their multi-scale and hierarchical variations have
been successful at capturing image representations but their use has been generally …
been successful at capturing image representations but their use has been generally …
Learning to prompt for continual learning
The mainstream paradigm behind continual learning has been to adapt the model
parameters to non-stationary data distributions, where catastrophic forgetting is the central …
parameters to non-stationary data distributions, where catastrophic forgetting is the central …
Compute trends across three eras of machine learning
Compute, data, and algorithmic advances are the three fundamental factors that drive
progress in modern Machine Learning (ML). In this paper we study trends in the most readily …
progress in modern Machine Learning (ML). In this paper we study trends in the most readily …
Uformer: A general u-shaped transformer for image restoration
In this paper, we present Uformer, an effective and efficient Transformer-based architecture
for image restoration, in which we build a hierarchical encoder-decoder network using the …
for image restoration, in which we build a hierarchical encoder-decoder network using the …
A generalist framework for panoptic segmentation of images and videos
Panoptic segmentation assigns semantic and instance ID labels to every pixel of an image.
As permutations of instance IDs are also valid solutions, the task requires learning of high …
As permutations of instance IDs are also valid solutions, the task requires learning of high …
A survey on vision transformer
Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …
network mainly based on the self-attention mechanism. Thanks to its strong representation …
Hyperspectral image transformer classification networks
Hyperspectral image (HSI) classification is an important task in earth observation missions.
Convolution neural networks (CNNs) with the powerful ability of feature extraction have …
Convolution neural networks (CNNs) with the powerful ability of feature extraction have …
Crossformer++: A versatile vision transformer hinging on cross-scale attention
While features of different scales are perceptually important to visual inputs, existing vision
transformers do not yet take advantage of them explicitly. To this end, we first propose a …
transformers do not yet take advantage of them explicitly. To this end, we first propose a …