Efficientformer: Vision transformers at mobilenet speed

Y Li, G Yuan, Y Wen, J Hu… - Advances in …, 2022 - proceedings.neurips.cc
Abstract Vision Transformers (ViT) have shown rapid progress in computer vision tasks,
achieving promising results on various benchmarks. However, due to the massive number of …

Rethinking vision transformers for mobilenet size and speed

Y Li, J Hu, Y Wen, G Evangelidis… - Proceedings of the …, 2023 - openaccess.thecvf.com
With the success of Vision Transformers (ViTs) in computer vision tasks, recent arts try to
optimize the performance and complexity of ViTs to enable efficient deployment on mobile …

Edgevits: Competing light-weight cnns on mobile devices with vision transformers

J Pan, A Bulat, F Tan, X Zhu, L Dudziak, H Li… - … on Computer Vision, 2022 - Springer
Self-attention based models such as vision transformers (ViTs) have emerged as a very
competitive architecture alternative to convolutional neural networks (CNNs) in computer …

Repvit: Revisiting mobile cnn from vit perspective

A Wang, H Chen, Z Lin, J Han… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Abstract Recently lightweight Vision Transformers (ViTs) demonstrate superior performance
and lower latency compared with lightweight Convolutional Neural Networks (CNNs) on …

FastViT: A fast hybrid vision transformer using structural reparameterization

PKA Vasu, J Gabriel, J Zhu, O Tuzel… - Proceedings of the …, 2023 - openaccess.thecvf.com
The recent amalgamation of transformer and convolutional designs has led to steady
improvements in accuracy and efficiency of the models. In this work, we introduce FastViT, a …

Efficientvit: Memory efficient vision transformer with cascaded group attention

X Liu, H Peng, N Zheng, Y Yang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Vision transformers have shown great success due to their high model capabilities.
However, their remarkable performance is accompanied by heavy computation costs, which …

Edgenext: efficiently amalgamated cnn-transformer architecture for mobile vision applications

M Maaz, A Shaker, H Cholakkal, S Khan… - European conference on …, 2022 - Springer
In the pursuit of achieving ever-increasing accuracy, large and complex neural networks are
usually developed. Such models demand high computational resources and therefore …

Towards efficient vision transformer inference: A first study of transformers on mobile devices

X Wang, LL Zhang, Y Wang, M Yang - Proceedings of the 23rd annual …, 2022 - dl.acm.org
Convolution neural networks (CNNs) have long been dominating the model choice in on-
device intelligent mobile applications. Recently, we are witnessing the fast development of …

Separable self-attention for mobile vision transformers

S Mehta, M Rastegari - arXiv preprint arXiv:2206.02680, 2022 - arxiv.org
Mobile vision transformers (MobileViT) can achieve state-of-the-art performance across
several mobile vision tasks, including classification and detection. Though these models …

Patch slimming for efficient vision transformers

Y Tang, K Han, Y Wang, C Xu, J Guo… - Proceedings of the …, 2022 - openaccess.thecvf.com
This paper studies the efficiency problem for visual transformers by excavating redundant
calculation in given networks. The recent transformer architecture has demonstrated its …