Auto-vit-acc: An fpga-aware automatic acceleration framework for vision transformer with...

Y Tang, Y Wang, J Guo, Z Tu, K Han, H Hu… - arXiv preprint arXiv …, 2024 - arxiv.org

Large models based on the Transformer architecture play increasingly vital roles in artificial
intelligence, particularly within the realms of natural language processing (NLP) and …

被引用次数：22 相关文章所有 2 个版本

[PDF] mdpi.com

A review of the optimal design of neural networks based on FPGA

C Wang, Z Luo - Applied Sciences, 2022 - mdpi.com

Deep learning based on neural networks has been widely used in image recognition,
speech recognition, natural language processing, automatic driving, and other fields and …

被引用次数：33 相关文章所有 5 个版本

[PDF] thecvf.com

Jumping through local minima: Quantization in the loss landscape of vision transformers

N Frumkin, D Gope… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Quantization scale and bit-width are the most important parameters when considering how
to quantize a neural network. Prior work focuses on optimizing quantization scales in a …

被引用次数：15 相关文章所有 5 个版本

[PDF] thecvf.com

Boost vision transformer with gpu-friendly sparsity and quantization

C Yu, T Chen, Z Gan, J Fan - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

The transformer extends its success from the language to the vision domain. Because of the
numerous stacked self-attention and cross-attention blocks in the transformer, which involve …

被引用次数：19 相关文章所有 5 个版本

[PDF] ieee.org

A survey on efficient vision transformers: algorithms, techniques, and performance benchmarking

L Papa, P Russo, I Amerini… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Vision Transformer (ViT) architectures are becoming increasingly popular and widely
employed to tackle computer vision applications. Their main feature is the capacity to extract …

被引用次数：27 相关文章所有 7 个版本

[PDF] arxiv.org

Heatvit: Hardware-efficient adaptive token pruning for vision transformers

P Dong, M Sun, A Lu, Y Xie, K Liu… - … Symposium on High …, 2023 - ieeexplore.ieee.org

While vision transformers (ViTs) have continuously achieved new milestones in the field of
computer vision, their sophisticated network architectures with high computation and …

被引用次数：40 相关文章所有 7 个版本

[PDF] arxiv.org

ViTA: A vision transformer inference accelerator for edge applications

S Nag, G Datta, S Kundu… - … on Circuits and …, 2023 - ieeexplore.ieee.org

Vision Transformer models, such as ViT, Swin Transformer, and Transformer-in-Transformer,
have recently gained significant traction in computer vision tasks due to their ability to …

被引用次数：21 相关文章所有 5 个版本

[PDF] arxiv.org

Efficient multimodal large language models: A survey

Y Jin, J Li, Y Liu, T Gu, K Wu, Z Jiang, M He… - arXiv preprint arXiv …, 2024 - arxiv.org

In the past year, Multimodal Large Language Models (MLLMs) have demonstrated
remarkable performance in tasks such as visual question answering, visual understanding …

被引用次数：31 相关文章所有 2 个版本

[PDF] arxiv.org

Edge-moe: Memory-efficient multi-task vision transformer architecture with task-level sparsity via mixture-of-experts

R Sarkar, H Liang, Z Fan, Z Wang… - 2023 IEEE/ACM …, 2023 - ieeexplore.ieee.org

The computer vision community is embracing two promising learning paradigms: the Vision
Transformer (ViT) and Multi-task Learning (MTL). ViT models show extraordinary …

被引用次数：8 相关文章所有 3 个版本

[PDF] github.io

EQ-ViT: Algorithm-Hardware Co-Design for End-to-End Acceleration of Real-Time Vision Transformer Inference on Versal ACAP Architecture

P Dong, J Zhuang, Z Yang, S Ji, Y Li… - … on Computer-Aided …, 2024 - ieeexplore.ieee.org

While vision transformers (ViTs) have shown consistent progress in computer vision,
deploying them for real-time decision-making scenarios (< 1 ms) is challenging. Current …

被引用次数：2 相关文章所有 5 个版本