Transformers in medical imaging: A survey
Following unprecedented success on the natural language tasks, Transformers have been
successfully applied to several computer vision problems, achieving state-of-the-art results …
successfully applied to several computer vision problems, achieving state-of-the-art results …
Attention mechanisms in computer vision: A survey
Humans can naturally and effectively find salient regions in complex scenes. Motivated by
this observation, attention mechanisms were introduced into computer vision with the aim of …
this observation, attention mechanisms were introduced into computer vision with the aim of …
Segnext: Rethinking convolutional attention design for semantic segmentation
We present SegNeXt, a simple convolutional network architecture for semantic
segmentation. Recent transformer-based models have dominated the field of se-mantic …
segmentation. Recent transformer-based models have dominated the field of se-mantic …
Convolutions die hard: Open-vocabulary segmentation with single frozen convolutional clip
Open-vocabulary segmentation is a challenging task requiring segmenting and recognizing
objects from an open set of categories in diverse environments. One way to address this …
objects from an open set of categories in diverse environments. One way to address this …
Cddfuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion
Multi-modality (MM) image fusion aims to render fused images that maintain the merits of
different modalities, eg, functional highlight and detailed textures. To tackle the challenge in …
different modalities, eg, functional highlight and detailed textures. To tackle the challenge in …
Large selective kernel network for remote sensing object detection
Recent research on remote sensing object detection has largely focused on improving the
representation of oriented bounding boxes but has overlooked the unique prior knowledge …
representation of oriented bounding boxes but has overlooked the unique prior knowledge …
Adaptformer: Adapting vision transformers for scalable visual recognition
Abstract Pretraining Vision Transformers (ViTs) has achieved great success in visual
recognition. A following scenario is to adapt a ViT to various image and video recognition …
recognition. A following scenario is to adapt a ViT to various image and video recognition …
Vision transformer adapter for dense predictions
This work investigates a simple yet powerful adapter for Vision Transformer (ViT). Unlike
recent visual transformers that introduce vision-specific inductive biases into their …
recent visual transformers that introduce vision-specific inductive biases into their …
SwinFusion: Cross-domain long-range learning for general image fusion via swin transformer
This study proposes a novel general image fusion framework based on cross-domain long-
range learning and Swin Transformer, termed as SwinFusion. On the one hand, an attention …
range learning and Swin Transformer, termed as SwinFusion. On the one hand, an attention …
Visual prompt tuning
The current modus operandi in adapting pre-trained models involves updating all the
backbone parameters, ie., full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) …
backbone parameters, ie., full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) …