A survey on transformer compression

Y Tang, Y Wang, J Guo, Z Tu, K Han, H Hu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large models based on the Transformer architecture play increasingly vital roles in artificial
intelligence, particularly within the realms of natural language processing (NLP) and …

A review of the optimal design of neural networks based on FPGA

C Wang, Z Luo - Applied Sciences, 2022 - mdpi.com
Deep learning based on neural networks has been widely used in image recognition,
speech recognition, natural language processing, automatic driving, and other fields and …

Jumping through local minima: Quantization in the loss landscape of vision transformers

N Frumkin, D Gope… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Quantization scale and bit-width are the most important parameters when considering how
to quantize a neural network. Prior work focuses on optimizing quantization scales in a …

Boost vision transformer with gpu-friendly sparsity and quantization

C Yu, T Chen, Z Gan, J Fan - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
The transformer extends its success from the language to the vision domain. Because of the
numerous stacked self-attention and cross-attention blocks in the transformer, which involve …

A survey on efficient vision transformers: algorithms, techniques, and performance benchmarking

L Papa, P Russo, I Amerini… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Vision Transformer (ViT) architectures are becoming increasingly popular and widely
employed to tackle computer vision applications. Their main feature is the capacity to extract …

Heatvit: Hardware-efficient adaptive token pruning for vision transformers

P Dong, M Sun, A Lu, Y Xie, K Liu… - … Symposium on High …, 2023 - ieeexplore.ieee.org
While vision transformers (ViTs) have continuously achieved new milestones in the field of
computer vision, their sophisticated network architectures with high computation and …

ViTA: A vision transformer inference accelerator for edge applications

S Nag, G Datta, S Kundu… - … on Circuits and …, 2023 - ieeexplore.ieee.org
Vision Transformer models, such as ViT, Swin Transformer, and Transformer-in-Transformer,
have recently gained significant traction in computer vision tasks due to their ability to …

Efficient multimodal large language models: A survey

Y Jin, J Li, Y Liu, T Gu, K Wu, Z Jiang, M He… - arXiv preprint arXiv …, 2024 - arxiv.org
In the past year, Multimodal Large Language Models (MLLMs) have demonstrated
remarkable performance in tasks such as visual question answering, visual understanding …

Edge-moe: Memory-efficient multi-task vision transformer architecture with task-level sparsity via mixture-of-experts

R Sarkar, H Liang, Z Fan, Z Wang… - 2023 IEEE/ACM …, 2023 - ieeexplore.ieee.org
The computer vision community is embracing two promising learning paradigms: the Vision
Transformer (ViT) and Multi-task Learning (MTL). ViT models show extraordinary …

EQ-ViT: Algorithm-Hardware Co-Design for End-to-End Acceleration of Real-Time Vision Transformer Inference on Versal ACAP Architecture

P Dong, J Zhuang, Z Yang, S Ji, Y Li… - … on Computer-Aided …, 2024 - ieeexplore.ieee.org
While vision transformers (ViTs) have shown consistent progress in computer vision,
deploying them for real-time decision-making scenarios (< 1 ms) is challenging. Current …