A survey of techniques for optimizing transformer inference

KT Chitty-Venkata, S Mittal, M Emani… - Journal of Systems …, 2023 - Elsevier
Recent years have seen a phenomenal rise in the performance and applications of
transformer neural networks. The family of transformer networks, including Bidirectional …

Run, don't walk: chasing higher FLOPS for faster neural networks

J Chen, S Kao, H He, W Zhuo, S Wen… - Proceedings of the …, 2023 - openaccess.thecvf.com
To design fast neural networks, many works have been focusing on reducing the number of
floating-point operations (FLOPs). We observe that such reduction in FLOPs, however, does …

Efficientvit: Memory efficient vision transformer with cascaded group attention

X Liu, H Peng, N Zheng, Y Yang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Vision transformers have shown great success due to their high model capabilities.
However, their remarkable performance is accompanied by heavy computation costs, which …

Rethinking vision transformers for mobilenet size and speed

Y Li, J Hu, Y Wen, G Evangelidis… - Proceedings of the …, 2023 - openaccess.thecvf.com
With the success of Vision Transformers (ViTs) in computer vision tasks, recent arts try to
optimize the performance and complexity of ViTs to enable efficient deployment on mobile …

Repvit: Revisiting mobile cnn from vit perspective

A Wang, H Chen, Z Lin, J Han… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Abstract Recently lightweight Vision Transformers (ViTs) demonstrate superior performance
and lower latency compared with lightweight Convolutional Neural Networks (CNNs) on …

Spike-driven transformer

M Yao, J Hu, Z Zhou, L Yuan, Y Tian… - Advances in neural …, 2024 - proceedings.neurips.cc
Abstract Spiking Neural Networks (SNNs) provide an energy-efficient deep learning option
due to their unique spike-based event-driven (ie, spike-driven) paradigm. In this paper, we …

Snapfusion: Text-to-image diffusion model on mobile devices within two seconds

Y Li, H Wang, Q Jin, J Hu… - Advances in …, 2024 - proceedings.neurips.cc
Text-to-image diffusion models can create stunning images from natural language
descriptions that rival the work of professional artists and photographers. However, these …

FastViT: A fast hybrid vision transformer using structural reparameterization

PKA Vasu, J Gabriel, J Zhu, O Tuzel… - Proceedings of the …, 2023 - openaccess.thecvf.com
The recent amalgamation of transformer and convolutional designs has led to steady
improvements in accuracy and efficiency of the models. In this work, we introduce FastViT, a …

Faster segment anything: Towards lightweight sam for mobile applications

C Zhang, D Han, Y Qiao, JU Kim, SH Bae… - arXiv preprint arXiv …, 2023 - arxiv.org
Segment anything model (SAM) is a prompt-guided vision foundation model for cutting out
the object of interest from its background. Since Meta research team released the SA project …

Efficientsam: Leveraged masked image pretraining for efficient segment anything

Y Xiong, B Varadarajan, L Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Segment Anything Model (SAM) has emerged as a powerful tool for numerous
vision applications. A key component that drives the impressive performance for zero-shot …