Tinyvit: Fast pretraining distillation for small vision transformers

X Liu, H Peng, N Zheng, Y Yang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Vision transformers have shown great success due to their high model capabilities.
However, their remarkable performance is accompanied by heavy computation costs, which …

被引用次数：305 相关文章所有 8 个版本

[PDF] arxiv.org

Faster segment anything: Towards lightweight sam for mobile applications

C Zhang, D Han, Y Qiao, JU Kim, SH Bae… - arXiv preprint arXiv …, 2023 - arxiv.org

Segment anything model (SAM) is a prompt-guided vision foundation model for cutting out
the object of interest from its background. Since Meta research team released the SA project …

被引用次数：312 相关文章所有 4 个版本

[PDF] thecvf.com

Efficientsam: Leveraged masked image pretraining for efficient segment anything

Y Xiong, B Varadarajan, L Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Segment Anything Model (SAM) has emerged as a powerful tool for numerous
vision applications. A key component that drives the impressive performance for zero-shot …

被引用次数：103 相关文章所有 3 个版本

[PDF] thecvf.com

Distilling large vision-language model with out-of-distribution generalizability

X Li, Y Fang, M Liu, Z Ling, Z Tu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Large vision-language models have achieved outstanding performance, but their size and
computational requirements make their deployment on resource-constrained devices and …

被引用次数：26 相关文章所有 5 个版本

[PDF] thecvf.com

Exploring lightweight hierarchical vision transformers for efficient visual tracking

B Kang, X Chen, D Wang, H Peng… - Proceedings of the …, 2023 - openaccess.thecvf.com

Transformer-based visual trackers have demonstrated significant progress owing to their
superior modeling capabilities. However, existing trackers are hampered by low speed …

被引用次数：47 相关文章所有 5 个版本

[PDF] arxiv.org

A survey on transformer compression

Y Tang, Y Wang, J Guo, Z Tu, K Han, H Hu… - arXiv preprint arXiv …, 2024 - arxiv.org

Large models based on the Transformer architecture play increasingly vital roles in artificial
intelligence, particularly within the realms of natural language processing (NLP) and …

被引用次数：27 相关文章所有 2 个版本

[PDF] thecvf.com

Tinyclip: Clip distillation via affinity mimicking and weight inheritance

K Wu, H Peng, Z Zhou, B Xiao, M Liu… - Proceedings of the …, 2023 - openaccess.thecvf.com

In this paper, we propose a novel cross-modal distillation method, called TinyCLIP, for large-
scale language-image pre-trained models. The method introduces two core techniques …

被引用次数：42 相关文章所有 5 个版本

[PDF] arxiv.org

A survey of the vision transformers and their CNN-transformer based variants

A Khan, Z Rauf, A Sohail, AR Khan, H Asif… - Artificial Intelligence …, 2023 - Springer

Vision transformers have become popular as a possible substitute to convolutional neural
networks (CNNs) for a variety of computer vision applications. These transformers, with their …

被引用次数：87 相关文章所有 6 个版本

[PDF] thecvf.com

Logit standardization in knowledge distillation

S Sun, W Ren, J Li, R Wang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Abstract Knowledge distillation involves transferring soft labels from a teacher to a student
using a shared temperature-based softmax function. However the assumption of a shared …

被引用次数：56 相关文章所有 6 个版本

[PDF] thecvf.com

Diffrate: Differentiable compression rate for efficient vision transformers

M Chen, W Shao, P Xu, M Lin… - Proceedings of the …, 2023 - openaccess.thecvf.com

Token compression aims to speed up large-scale vision transformers (eg ViTs) by pruning
(dropping) or merging tokens. It is an important but challenging task. Although recent …

被引用次数：40 相关文章所有 5 个版本