[HTML][HTML] A survey of transformers

T Lin, Y Wang, X Liu, X Qiu - AI open, 2022 - Elsevier
Transformers have achieved great success in many artificial intelligence fields, such as
natural language processing, computer vision, and audio processing. Therefore, it is natural …

A survey on vision transformer

K Han, Y Wang, H Chen, X Chen, J Guo… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …

A survey on visual transformer

K Han, Y Wang, H Chen, X Chen, J Guo, Z Liu… - arXiv preprint arXiv …, 2020 - arxiv.org
Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …

Video transformers: A survey

J Selva, AS Johansen, S Escalera… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Transformer models have shown great success handling long-range interactions, making
them a promising tool for modeling video. However, they lack inductive biases and scale …

Pidro: Parallel isomeric attention with dynamic routing for text-video retrieval

P Guan, R Pei, B Shao, J Liu, W Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
Text-video retrieval is a fundamental task with high practical value in multi-modal research.
Inspired by the great success of pre-trained image-text models with large-scale data, such …

RadFormer: Transformers with global–local attention for interpretable and accurate Gallbladder Cancer detection

S Basu, M Gupta, P Rana, P Gupta, C Arora - Medical Image Analysis, 2023 - Elsevier
We propose a novel deep neural network architecture to learn interpretable representation
for medical image analysis. Our architecture generates a global attention for region of …

Ood-cv: A benchmark for robustness to out-of-distribution shifts of individual nuisances in natural images

B Zhao, S Yu, W Ma, M Yu, S Mei, A Wang, J He… - European conference on …, 2022 - Springer
Enhancing the robustness of vision algorithms in real-world scenarios is challenging. One
reason is that existing robustness benchmarks are limited, as they either rely on synthetic …

Transvcl: Attention-enhanced video copy localization network with flexible supervision

S He, Y He, M Lu, C Jiang, X Yang, F Qian… - Proceedings of the …, 2023 - ojs.aaai.org
Video copy localization aims to precisely localize all the copied segments within a pair of
untrimmed videos in video retrieval applications. Previous methods typically start from frame …

Spatial-temporal transformer for 3d point cloud sequences

Y Wei, H Liu, T Xie, Q Ke, Y Guo - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Effective learning of spatial-temporal information within a point cloud sequence is highly
important for many down-stream tasks such as 4D semantic segmentation and 3D action …

Loformer: Local frequency transformer for image deblurring

X Mao, J Wang, X Xie, Q Li, Y Wang - Proceedings of the 32nd ACM …, 2024 - dl.acm.org
Due to the computational complexity of self-attention (SA), prevalent techniques for image
deblurring often resort to either adopting localized SA or employing coarse-grained global …