A survey on transformer compression
Large models based on the Transformer architecture play increasingly vital roles in artificial
intelligence, particularly within the realms of natural language processing (NLP) and …
intelligence, particularly within the realms of natural language processing (NLP) and …
A review of the optimal design of neural networks based on FPGA
C Wang, Z Luo - Applied Sciences, 2022 - mdpi.com
Deep learning based on neural networks has been widely used in image recognition,
speech recognition, natural language processing, automatic driving, and other fields and …
speech recognition, natural language processing, automatic driving, and other fields and …
Jumping through local minima: Quantization in the loss landscape of vision transformers
Quantization scale and bit-width are the most important parameters when considering how
to quantize a neural network. Prior work focuses on optimizing quantization scales in a …
to quantize a neural network. Prior work focuses on optimizing quantization scales in a …
Boost vision transformer with gpu-friendly sparsity and quantization
The transformer extends its success from the language to the vision domain. Because of the
numerous stacked self-attention and cross-attention blocks in the transformer, which involve …
numerous stacked self-attention and cross-attention blocks in the transformer, which involve …
A survey on efficient vision transformers: algorithms, techniques, and performance benchmarking
Vision Transformer (ViT) architectures are becoming increasingly popular and widely
employed to tackle computer vision applications. Their main feature is the capacity to extract …
employed to tackle computer vision applications. Their main feature is the capacity to extract …
Heatvit: Hardware-efficient adaptive token pruning for vision transformers
While vision transformers (ViTs) have continuously achieved new milestones in the field of
computer vision, their sophisticated network architectures with high computation and …
computer vision, their sophisticated network architectures with high computation and …
ViTA: A vision transformer inference accelerator for edge applications
Vision Transformer models, such as ViT, Swin Transformer, and Transformer-in-Transformer,
have recently gained significant traction in computer vision tasks due to their ability to …
have recently gained significant traction in computer vision tasks due to their ability to …
Efficient multimodal large language models: A survey
In the past year, Multimodal Large Language Models (MLLMs) have demonstrated
remarkable performance in tasks such as visual question answering, visual understanding …
remarkable performance in tasks such as visual question answering, visual understanding …
Edge-moe: Memory-efficient multi-task vision transformer architecture with task-level sparsity via mixture-of-experts
The computer vision community is embracing two promising learning paradigms: the Vision
Transformer (ViT) and Multi-task Learning (MTL). ViT models show extraordinary …
Transformer (ViT) and Multi-task Learning (MTL). ViT models show extraordinary …
EQ-ViT: Algorithm-Hardware Co-Design for End-to-End Acceleration of Real-Time Vision Transformer Inference on Versal ACAP Architecture
While vision transformers (ViTs) have shown consistent progress in computer vision,
deploying them for real-time decision-making scenarios (< 1 ms) is challenging. Current …
deploying them for real-time decision-making scenarios (< 1 ms) is challenging. Current …