A-vit: Adaptive tokens for efficient vision transformer

H Yin, A Vahdat, JM Alvarez, A Mallya… - Proceedings of the …, 2022 - openaccess.thecvf.com
We introduce A-ViT, a method that adaptively adjusts the inference cost of vision transformer
ViT for images of different complexity. A-ViT achieves this by automatically reducing the …

A fast post-training pruning framework for transformers

W Kwon, S Kim, MW Mahoney… - Advances in …, 2022 - proceedings.neurips.cc
Pruning is an effective way to reduce the huge inference cost of Transformer models.
However, prior work on pruning Transformers requires retraining the models. This can add …

Efficient methods for natural language processing: A survey

M Treviso, JU Lee, T Ji, B Aken, Q Cao… - Transactions of the …, 2023 - direct.mit.edu
Recent work in natural language processing (NLP) has yielded appealing results from
scaling model parameters and training data; however, using only scale to improve …

Full stack optimization of transformer inference: a survey

S Kim, C Hooper, T Wattanawong, M Kang… - arXiv preprint arXiv …, 2023 - arxiv.org
Recent advances in state-of-the-art DNN architecture design have been moving toward
Transformer models. These models achieve superior accuracy across a wide range of …

A survey on green deep learning

J Xu, W Zhou, Z Fu, H Zhou, L Li - arXiv preprint arXiv:2111.05193, 2021 - arxiv.org
In recent years, larger and deeper models are springing up and continuously pushing state-
of-the-art (SOTA) results across various fields like natural language processing (NLP) and …

Compressing large-scale transformer-based models: A case study on bert

P Ganesh, Y Chen, X Lou, MA Khan, Y Yang… - Transactions of the …, 2021 - direct.mit.edu
Pre-trained Transformer-based models have achieved state-of-the-art performance for
various Natural Language Processing (NLP) tasks. However, these models often have …

An algorithm–hardware co-optimized framework for accelerating n: M sparse transformers

C Fang, A Zhou, Z Wang - IEEE Transactions on Very Large …, 2022 - ieeexplore.ieee.org
The Transformer has been an indispensable staple in deep learning. However, for real-life
applications, it is very challenging to deploy efficient Transformers due to the immense …

Dfx: A low-latency multi-fpga appliance for accelerating transformer-based text generation

S Hong, S Moon, J Kim, S Lee, M Kim… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
Transformer is a deep learning language model widely used for natural language
processing (NLP) services in datacenters. Among transformer models, Generative …

A Review on Edge Large Language Models: Design, Execution, and Applications

Y Zheng, Y Chen, B Qian, X Shi, Y Shu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have revolutionized natural language processing with their
exceptional capabilities. However, deploying LLMs on resource-constrained edge devices …

Adaptable butterfly accelerator for attention-based NNs via hardware and algorithm co-design

H Fan, T Chau, SI Venieris, R Lee… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
Attention-based neural networks have become pervasive in many AI tasks. Despite their
excellent algorithmic performance, the use of the attention mechanism and feedforward …