Minivit: Compressing vision transformers with weight multiplexing

X Liu, H Peng, N Zheng, Y Yang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Vision transformers have shown great success due to their high model capabilities.
However, their remarkable performance is accompanied by heavy computation costs, which …

被引用次数：193 相关文章所有 8 个版本

[PDF] ecva.net

Tinyvit: Fast pretraining distillation for small vision transformers

K Wu, J Zhang, H Peng, M Liu, B Xiao, J Fu… - European conference on …, 2022 - Springer

Vision transformer (ViT) recently has drawn great attention in computer vision due to its
remarkable model capability. However, most prevailing ViT models suffer from huge number …

被引用次数：175 相关文章所有 5 个版本

[PDF] aaai.org

Fact: Factor-tuning for lightweight adaptation on vision transformer

S Jie, ZH Deng - Proceedings of the AAAI conference on artificial …, 2023 - ojs.aaai.org

Recent work has explored the potential to adapt a pre-trained vision transformer (ViT) by
updating only a few parameters so as to improve storage efficiency, called parameter …

被引用次数：77 相关文章所有 4 个版本

[PDF] arxiv.org

Mobilevitv3: Mobile-friendly vision transformer with simple and effective fusion of local, global and input features

SN Wadekar, A Chaurasia - arXiv preprint arXiv:2209.15159, 2022 - arxiv.org

MobileViT (MobileViTv1) combines convolutional neural networks (CNNs) and vision
transformers (ViTs) to create light-weight models for mobile vision tasks. Though the main …

被引用次数：95 相关文章所有 5 个版本

[PDF] neurips.cc

Mixformerv2: Efficient fully transformer tracking

Y Cui, T Song, G Wu, L Wang - Advances in Neural …, 2024 - proceedings.neurips.cc

Transformer-based trackers have achieved strong accuracy on the standard benchmarks.
However, their efficiency remains an obstacle to practical deployment on both GPU and …

被引用次数：35 相关文章所有 5 个版本

[PDF] thecvf.com

I-vit: Integer-only quantization for efficient vision transformer inference

Z Li, Q Gu - Proceedings of the IEEE/CVF International …, 2023 - openaccess.thecvf.com

Abstract Vision Transformers (ViTs) have achieved state-of-the-art performance on various
computer vision applications. However, these models have considerable storage and …

被引用次数：71 相关文章所有 5 个版本

[PDF] thecvf.com

Repq-vit: Scale reparameterization for post-training quantization of vision transformers

Z Li, J Xiao, L Yang, Q Gu - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

Abstract Post-training quantization (PTQ), which only requires a tiny dataset for calibration
without end-to-end retraining, is a light and practical model compression technique …

被引用次数：55 相关文章所有 6 个版本

[PDF] thecvf.com

A good student is cooperative and reliable: CNN-transformer collaborative learning for semantic segmentation

J Zhu, Y Luo, X Zheng, H Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com

In this paper, we strive to answer the question'how to collaboratively learn convolutional
neural network (CNN)-based and vision transformer (ViT)-based models by selecting and …

被引用次数：22 相关文章所有 5 个版本

[PDF] acm.org

Efficient high-resolution deep learning: A survey

A Bakhtiarnia, Q Zhang, A Iosifidis - ACM Computing Surveys, 2024 - dl.acm.org

Cameras in modern devices such as smartphones, satellites and medical equipment are
capable of capturing very high resolution images and videos. Such high-resolution data …

被引用次数：12 相关文章所有 4 个版本

[PDF] thecvf.com

Riformer: Keep your vision backbone effective but removing token mixer

J Wang, S Zhang, Y Liu, T Wu, Y Yang… - Proceedings of the …, 2023 - openaccess.thecvf.com

This paper studies how to keep a vision backbone effective while removing token mixers in
its basic building blocks. Token mixers, as self-attention for vision transformers (ViTs), are …

被引用次数：20 相关文章所有 6 个版本