[HTML][HTML] Attention mechanisms in computer vision: A survey

MH Guo, TX Xu, JJ Liu, ZN Liu, PT Jiang, TJ Mu… - Computational visual …, 2022 - Springer
Humans can naturally and effectively find salient regions in complex scenes. Motivated by
this observation, attention mechanisms were introduced into computer vision with the aim of …

Recent advances and clinical applications of deep learning in medical image analysis

X Chen, X Wang, K Zhang, KM Fung, TC Thai… - Medical image …, 2022 - Elsevier
Deep learning has received extensive research interest in developing new medical image
processing algorithms, and deep learning based models have been remarkably successful …

[HTML][HTML] Visual attention network

MH Guo, CZ Lu, ZN Liu, MM Cheng, SM Hu - Computational Visual Media, 2023 - Springer
While originally designed for natural language processing tasks, the self-attention
mechanism has recently taken various computer vision areas by storm. However, the 2D …

Swin transformer v2: Scaling up capacity and resolution

Z Liu, H Hu, Y Lin, Z Yao, Z Xie, Y Wei… - Proceedings of the …, 2022 - openaccess.thecvf.com
We present techniques for scaling Swin Transformer [??] up to 3 billion parameters and
making it capable of training with images of up to 1,536 x1, 536 resolution. By scaling up …

A small-sized object detection oriented multi-scale feature fusion approach with application to defect detection

N Zeng, P Wu, Z Wang, H Li, W Liu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Object detection is a well-known task in the field of computer vision, especially the small
target detection problem that has aroused great academic attention. In order to improve the …

An end-to-end transformer model for 3d object detection

I Misra, R Girdhar, A Joulin - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
We propose 3DETR, an end-to-end Transformer based object detection model for 3D point
clouds. Compared to existing detection methods that employ a number of 3D-specific …

Video swin transformer

Z Liu, J Ning, Y Cao, Y Wei, Z Zhang… - Proceedings of the …, 2022 - openaccess.thecvf.com
The vision community is witnessing a modeling shift from CNNs to Transformers, where pure
Transformer architectures have attained top accuracy on the major video recognition …

A survey of visual transformers

Y Liu, Y Zhang, Y Wang, F Hou, J Yuan… - … on Neural Networks …, 2023 - ieeexplore.ieee.org
Transformer, an attention-based encoder–decoder model, has already revolutionized the
field of natural language processing (NLP). Inspired by such significant achievements, some …

Swin-unet: Unet-like pure transformer for medical image segmentation

H Cao, Y Wang, J Chen, D Jiang, X Zhang… - European conference on …, 2022 - Springer
In the past few years, convolutional neural networks (CNNs) have achieved milestones in
medical image analysis. In particular, deep neural networks based on U-shaped architecture …

End-to-end semi-supervised object detection with soft teacher

M Xu, Z Zhang, H Hu, J Wang, L Wang… - Proceedings of the …, 2021 - openaccess.thecvf.com
Previous pseudo-label approaches for semi-supervised object detection typically follow a
multi-stage schema, with the first stage to train an initial detector on a few labeled data …