Efficientvit: Memory efficient vision transformer with cascaded group attention

X Liu, H Peng, N Zheng, Y Yang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Vision transformers have shown great success due to their high model capabilities.
However, their remarkable performance is accompanied by heavy computation costs, which …

Tinyvit: Fast pretraining distillation for small vision transformers

K Wu, J Zhang, H Peng, M Liu, B Xiao, J Fu… - European conference on …, 2022 - Springer
Vision transformer (ViT) recently has drawn great attention in computer vision due to its
remarkable model capability. However, most prevailing ViT models suffer from huge number …

Fact: Factor-tuning for lightweight adaptation on vision transformer

S Jie, ZH Deng - Proceedings of the AAAI conference on artificial …, 2023 - ojs.aaai.org
Recent work has explored the potential to adapt a pre-trained vision transformer (ViT) by
updating only a few parameters so as to improve storage efficiency, called parameter …

Mobilevitv3: Mobile-friendly vision transformer with simple and effective fusion of local, global and input features

SN Wadekar, A Chaurasia - arXiv preprint arXiv:2209.15159, 2022 - arxiv.org
MobileViT (MobileViTv1) combines convolutional neural networks (CNNs) and vision
transformers (ViTs) to create light-weight models for mobile vision tasks. Though the main …

Mixformerv2: Efficient fully transformer tracking

Y Cui, T Song, G Wu, L Wang - Advances in Neural …, 2024 - proceedings.neurips.cc
Transformer-based trackers have achieved strong accuracy on the standard benchmarks.
However, their efficiency remains an obstacle to practical deployment on both GPU and …

I-vit: Integer-only quantization for efficient vision transformer inference

Z Li, Q Gu - Proceedings of the IEEE/CVF International …, 2023 - openaccess.thecvf.com
Abstract Vision Transformers (ViTs) have achieved state-of-the-art performance on various
computer vision applications. However, these models have considerable storage and …

Repq-vit: Scale reparameterization for post-training quantization of vision transformers

Z Li, J Xiao, L Yang, Q Gu - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Abstract Post-training quantization (PTQ), which only requires a tiny dataset for calibration
without end-to-end retraining, is a light and practical model compression technique …

A good student is cooperative and reliable: CNN-transformer collaborative learning for semantic segmentation

J Zhu, Y Luo, X Zheng, H Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
In this paper, we strive to answer the question'how to collaboratively learn convolutional
neural network (CNN)-based and vision transformer (ViT)-based models by selecting and …

Efficient high-resolution deep learning: A survey

A Bakhtiarnia, Q Zhang, A Iosifidis - ACM Computing Surveys, 2024 - dl.acm.org
Cameras in modern devices such as smartphones, satellites and medical equipment are
capable of capturing very high resolution images and videos. Such high-resolution data …

Riformer: Keep your vision backbone effective but removing token mixer

J Wang, S Zhang, Y Liu, T Wu, Y Yang… - Proceedings of the …, 2023 - openaccess.thecvf.com
This paper studies how to keep a vision backbone effective while removing token mixers in
its basic building blocks. Token mixers, as self-attention for vision transformers (ViTs), are …