Attention mechanisms in computer vision: A survey
Humans can naturally and effectively find salient regions in complex scenes. Motivated by
this observation, attention mechanisms were introduced into computer vision with the aim of …
this observation, attention mechanisms were introduced into computer vision with the aim of …
[HTML][HTML] Transformers in medical image analysis
Transformers have dominated the field of natural language processing and have recently
made an impact in the area of computer vision. In the field of medical image analysis …
made an impact in the area of computer vision. In the field of medical image analysis …
Vision mamba: Efficient visual representation learning with bidirectional state space model
Recently the state space models (SSMs) with efficient hardware-aware designs, ie, the
Mamba deep learning model, have shown great potential for long sequence modeling …
Mamba deep learning model, have shown great potential for long sequence modeling …
Exploring plain vision transformer backbones for object detection
We explore the plain, non-hierarchical Vision Transformer (ViT) as a backbone network for
object detection. This design enables the original ViT architecture to be fine-tuned for object …
object detection. This design enables the original ViT architecture to be fine-tuned for object …
Scaling up your kernels to 31x31: Revisiting large kernel design in cnns
We revisit large kernel design in modern convolutional neural networks (CNNs). Inspired by
recent advances in vision transformers (ViTs), in this paper, we demonstrate that using a few …
recent advances in vision transformers (ViTs), in this paper, we demonstrate that using a few …
Visual attention network
While originally designed for natural language processing tasks, the self-attention
mechanism has recently taken various computer vision areas by storm. However, the 2D …
mechanism has recently taken various computer vision areas by storm. However, the 2D …
Unext: Mlp-based rapid medical image segmentation network
JMJ Valanarasu, VM Patel - … conference on medical image computing and …, 2022 - Springer
UNet and its latest extensions like TransUNet have been the leading medical image
segmentation methods in recent years. However, these networks cannot be effectively …
segmentation methods in recent years. However, these networks cannot be effectively …
Deit iii: Revenge of the vit
Abstract A Vision Transformer (ViT) is a simple neural architecture amenable to serve
several computer vision tasks. It has limited built-in architectural priors, in contrast to more …
several computer vision tasks. It has limited built-in architectural priors, in contrast to more …
Maxim: Multi-axis mlp for image processing
Recent progress on Transformers and multi-layer perceptron (MLP) models provide new
network architectural designs for computer vision tasks. Although these models proved to be …
network architectural designs for computer vision tasks. Although these models proved to be …
Metaformer is actually what you need for vision
Transformers have shown great potential in computer vision tasks. A common belief is their
attention-based token mixer module contributes most to their competence. However, recent …
attention-based token mixer module contributes most to their competence. However, recent …