Efficientvit: Memory efficient vision transformer with cascaded group attention
Vision transformers have shown great success due to their high model capabilities.
However, their remarkable performance is accompanied by heavy computation costs, which …
However, their remarkable performance is accompanied by heavy computation costs, which …
Tinyvit: Fast pretraining distillation for small vision transformers
Vision transformer (ViT) recently has drawn great attention in computer vision due to its
remarkable model capability. However, most prevailing ViT models suffer from huge number …
remarkable model capability. However, most prevailing ViT models suffer from huge number …
Fact: Factor-tuning for lightweight adaptation on vision transformer
Recent work has explored the potential to adapt a pre-trained vision transformer (ViT) by
updating only a few parameters so as to improve storage efficiency, called parameter …
updating only a few parameters so as to improve storage efficiency, called parameter …
Mobilevitv3: Mobile-friendly vision transformer with simple and effective fusion of local, global and input features
SN Wadekar, A Chaurasia - arXiv preprint arXiv:2209.15159, 2022 - arxiv.org
MobileViT (MobileViTv1) combines convolutional neural networks (CNNs) and vision
transformers (ViTs) to create light-weight models for mobile vision tasks. Though the main …
transformers (ViTs) to create light-weight models for mobile vision tasks. Though the main …
Mixformerv2: Efficient fully transformer tracking
Transformer-based trackers have achieved strong accuracy on the standard benchmarks.
However, their efficiency remains an obstacle to practical deployment on both GPU and …
However, their efficiency remains an obstacle to practical deployment on both GPU and …
I-vit: Integer-only quantization for efficient vision transformer inference
Abstract Vision Transformers (ViTs) have achieved state-of-the-art performance on various
computer vision applications. However, these models have considerable storage and …
computer vision applications. However, these models have considerable storage and …
Repq-vit: Scale reparameterization for post-training quantization of vision transformers
Abstract Post-training quantization (PTQ), which only requires a tiny dataset for calibration
without end-to-end retraining, is a light and practical model compression technique …
without end-to-end retraining, is a light and practical model compression technique …
A good student is cooperative and reliable: CNN-transformer collaborative learning for semantic segmentation
In this paper, we strive to answer the question'how to collaboratively learn convolutional
neural network (CNN)-based and vision transformer (ViT)-based models by selecting and …
neural network (CNN)-based and vision transformer (ViT)-based models by selecting and …
Efficient high-resolution deep learning: A survey
Cameras in modern devices such as smartphones, satellites and medical equipment are
capable of capturing very high resolution images and videos. Such high-resolution data …
capable of capturing very high resolution images and videos. Such high-resolution data …
Riformer: Keep your vision backbone effective but removing token mixer
This paper studies how to keep a vision backbone effective while removing token mixers in
its basic building blocks. Token mixers, as self-attention for vision transformers (ViTs), are …
its basic building blocks. Token mixers, as self-attention for vision transformers (ViTs), are …