[HTML][HTML] A survey of transformers
Transformers have achieved great success in many artificial intelligence fields, such as
natural language processing, computer vision, and audio processing. Therefore, it is natural …
natural language processing, computer vision, and audio processing. Therefore, it is natural …
A survey on vision transformer
Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …
network mainly based on the self-attention mechanism. Thanks to its strong representation …
A survey on visual transformer
Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …
network mainly based on the self-attention mechanism. Thanks to its strong representation …
Video transformers: A survey
Transformer models have shown great success handling long-range interactions, making
them a promising tool for modeling video. However, they lack inductive biases and scale …
them a promising tool for modeling video. However, they lack inductive biases and scale …
Pidro: Parallel isomeric attention with dynamic routing for text-video retrieval
P Guan, R Pei, B Shao, J Liu, W Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
Text-video retrieval is a fundamental task with high practical value in multi-modal research.
Inspired by the great success of pre-trained image-text models with large-scale data, such …
Inspired by the great success of pre-trained image-text models with large-scale data, such …
RadFormer: Transformers with global–local attention for interpretable and accurate Gallbladder Cancer detection
We propose a novel deep neural network architecture to learn interpretable representation
for medical image analysis. Our architecture generates a global attention for region of …
for medical image analysis. Our architecture generates a global attention for region of …
Ood-cv: A benchmark for robustness to out-of-distribution shifts of individual nuisances in natural images
Enhancing the robustness of vision algorithms in real-world scenarios is challenging. One
reason is that existing robustness benchmarks are limited, as they either rely on synthetic …
reason is that existing robustness benchmarks are limited, as they either rely on synthetic …
Transvcl: Attention-enhanced video copy localization network with flexible supervision
Video copy localization aims to precisely localize all the copied segments within a pair of
untrimmed videos in video retrieval applications. Previous methods typically start from frame …
untrimmed videos in video retrieval applications. Previous methods typically start from frame …
Spatial-temporal transformer for 3d point cloud sequences
Effective learning of spatial-temporal information within a point cloud sequence is highly
important for many down-stream tasks such as 4D semantic segmentation and 3D action …
important for many down-stream tasks such as 4D semantic segmentation and 3D action …
Loformer: Local frequency transformer for image deblurring
Due to the computational complexity of self-attention (SA), prevalent techniques for image
deblurring often resort to either adopting localized SA or employing coarse-grained global …
deblurring often resort to either adopting localized SA or employing coarse-grained global …