[HTML][HTML] Attention mechanisms in computer vision: A survey
Humans can naturally and effectively find salient regions in complex scenes. Motivated by
this observation, attention mechanisms were introduced into computer vision with the aim of …
this observation, attention mechanisms were introduced into computer vision with the aim of …
Recent advances and clinical applications of deep learning in medical image analysis
Deep learning has received extensive research interest in developing new medical image
processing algorithms, and deep learning based models have been remarkably successful …
processing algorithms, and deep learning based models have been remarkably successful …
[HTML][HTML] Visual attention network
While originally designed for natural language processing tasks, the self-attention
mechanism has recently taken various computer vision areas by storm. However, the 2D …
mechanism has recently taken various computer vision areas by storm. However, the 2D …
Swin transformer v2: Scaling up capacity and resolution
We present techniques for scaling Swin Transformer [??] up to 3 billion parameters and
making it capable of training with images of up to 1,536 x1, 536 resolution. By scaling up …
making it capable of training with images of up to 1,536 x1, 536 resolution. By scaling up …
A small-sized object detection oriented multi-scale feature fusion approach with application to defect detection
Object detection is a well-known task in the field of computer vision, especially the small
target detection problem that has aroused great academic attention. In order to improve the …
target detection problem that has aroused great academic attention. In order to improve the …
An end-to-end transformer model for 3d object detection
We propose 3DETR, an end-to-end Transformer based object detection model for 3D point
clouds. Compared to existing detection methods that employ a number of 3D-specific …
clouds. Compared to existing detection methods that employ a number of 3D-specific …
Video swin transformer
The vision community is witnessing a modeling shift from CNNs to Transformers, where pure
Transformer architectures have attained top accuracy on the major video recognition …
Transformer architectures have attained top accuracy on the major video recognition …
A survey of visual transformers
Transformer, an attention-based encoder–decoder model, has already revolutionized the
field of natural language processing (NLP). Inspired by such significant achievements, some …
field of natural language processing (NLP). Inspired by such significant achievements, some …
Swin-unet: Unet-like pure transformer for medical image segmentation
In the past few years, convolutional neural networks (CNNs) have achieved milestones in
medical image analysis. In particular, deep neural networks based on U-shaped architecture …
medical image analysis. In particular, deep neural networks based on U-shaped architecture …
End-to-end semi-supervised object detection with soft teacher
Previous pseudo-label approaches for semi-supervised object detection typically follow a
multi-stage schema, with the first stage to train an initial detector on a few labeled data …
multi-stage schema, with the first stage to train an initial detector on a few labeled data …