Scattering vision transformer: Spectral mixing matters
B Patro, V Agneeswaran - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Vision transformers have gained significant attention and achieved state-of-the-art
performance in various computer vision tasks, including image classification, instance …
performance in various computer vision tasks, including image classification, instance …
You Only Need Less Attention at Each Stage in Vision Transformers
Abstract The advent of Vision Transformers (ViTs) marks a substantial paradigm shift in the
realm of computer vision. ViTs capture the global information of images through self …
realm of computer vision. ViTs capture the global information of images through self …
Lightvit: Towards light-weight convolution-free vision transformers
Vision transformers (ViTs) are usually considered to be less light-weight than convolutional
neural networks (CNNs) due to the lack of inductive bias. Recent works thus resort to …
neural networks (CNNs) due to the lack of inductive bias. Recent works thus resort to …
DMFormer: Closing the gap Between CNN and Vision Transformers
Vision transformers have shown excellent performance in computer vision tasks. As the
computation cost of their self-attention mechanism is expensive, recent works tried to …
computation cost of their self-attention mechanism is expensive, recent works tried to …
SpectFormer: Frequency and Attention is what you need in a Vision Transformer
Vision transformers have been applied successfully for image recognition tasks. There have
been either multi-headed self-attention based (ViT\cite {dosovitskiy2020image}, DeIT,\cite …
been either multi-headed self-attention based (ViT\cite {dosovitskiy2020image}, DeIT,\cite …
Transmix: Attend to mix for vision transformers
Mixup-based augmentation has been found to be effective for generalizing models during
training, especially for Vision Transformers (ViTs) since they can easily overfit. However …
training, especially for Vision Transformers (ViTs) since they can easily overfit. However …
Scalable vision transformers with hierarchical pooling
The recently proposed Visual image Transformers (ViT) with pure attention have achieved
promising performance on image recognition tasks, such as image classification. However …
promising performance on image recognition tasks, such as image classification. However …
Ppt: Token pruning and pooling for efficient vision transformers
Vision Transformers (ViTs) have emerged as powerful models in the field of computer vision,
delivering superior performance across various vision tasks. However, the high …
delivering superior performance across various vision tasks. However, the high …
Vision transformers with patch diversification
Vision transformer has demonstrated promising performance on challenging computer
vision tasks. However, directly training the vision transformers may yield unstable and sub …
vision tasks. However, directly training the vision transformers may yield unstable and sub …
Less is more: Pay less attention in vision transformers
Transformers have become one of the dominant architectures in deep learning, particularly
as a powerful alternative to convolutional neural networks (CNNs) in computer vision …
as a powerful alternative to convolutional neural networks (CNNs) in computer vision …