Adaptable butterfly accelerator for attention-based NNs via hardware and algorithm co-design

KT Chitty-Venkata, S Mittal, M Emani… - Journal of Systems …, 2023 - Elsevier

Recent years have seen a phenomenal rise in the performance and applications of
transformer neural networks. The family of transformer networks, including Bidirectional …

被引用次数：28 相关文章所有 6 个版本

[PDF] arxiv.org

On efficient training of large-scale deep learning models: A literature review

L Shen, Y Sun, Z Yu, L Ding, X Tian, D Tao - arXiv preprint arXiv …, 2023 - arxiv.org

The field of deep learning has witnessed significant progress, particularly in computer vision
(CV), natural language processing (NLP), and speech. The use of large-scale models …

被引用次数：27 相关文章所有 2 个版本

[PDF] acm.org

Fact: Ffn-attention co-optimized transformer architecture with eager correlation prediction

Y Qin, Y Wang, D Deng, Z Zhao, X Yang, L Liu… - Proceedings of the 50th …, 2023 - dl.acm.org

Transformer model is becoming prevalent in various AI applications with its outstanding
performance. However, the high cost of computation and memory footprint make its …

被引用次数：22 相关文章

[PDF] arxiv.org

A survey on efficient inference for large language models

Z Zhou, X Ning, K Hong, T Fu, J Xu, S Li, Y Lou… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) have attracted extensive attention due to their remarkable
performance across various tasks. However, the substantial computational and memory …

被引用次数：18 相关文章所有 5 个版本

[PDF] neurips.cc

ShiftAddViT: Mixture of multiplication primitives towards efficient vision transformer

H You, H Shi, Y Guo, Y Lin - Advances in Neural …, 2024 - proceedings.neurips.cc

Abstract Vision Transformers (ViTs) have shown impressive performance and have become
a unified backbone for multiple vision tasks. However, both the attention mechanism and …

被引用次数：8 相关文章所有 7 个版本

[PDF] acm.org

Flightllm: Efficient large language model inference with a complete mapping flow on fpgas

S Zeng, J Liu, G Dai, X Yang, T Fu, H Wang… - Proceedings of the …, 2024 - dl.acm.org

Transformer-based Large Language Models (LLMs) have made a significant impact on
various domains. However, LLMs' efficiency suffers from both heavy computation and …

被引用次数：9 相关文章所有 8 个版本

[PDF] arxiv.org

Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse Multi-DNN Workloads

H Fan, SI Venieris, A Kouris, N Lane - … of the 56th Annual IEEE/ACM …, 2023 - dl.acm.org

Running multiple deep neural networks (DNNs) in parallel has become an emerging
workload in both edge devices, such as mobile phones where multiple tasks serve a single …

被引用次数：3 相关文章所有 6 个版本

ETTE: Efficient tensor-train-based computing engine for deep neural networks

Y Gong, M Yin, L Huang, J Xiao, Y Sui, C Deng… - Proceedings of the 50th …, 2023 - dl.acm.org

Tensor-train (TT) decomposition enables ultra-high compression ratio, making the deep
neural network (DNN) accelerators based on this method very attractive. TIE, the state-of-the …

被引用次数：4 相关文章所有 2 个版本

[PDF] umich.edu

Taskfusion: An efficient transfer learning architecture with dual delta sparsity for multi-task natural language processing

Z Fan, Q Zhang, P Abillama, S Shoouri, C Lee… - Proceedings of the 50th …, 2023 - dl.acm.org

The combination of pre-trained models and task-specific fine-tuning schemes, such as
BERT, has achieved great success in various natural language processing (NLP) tasks …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

MELTing point: Mobile Evaluation of Language Transformers

S Laskaridis, K Kateveas, L Minto… - arXiv preprint arXiv …, 2024 - arxiv.org

Transformers have revolutionized the machine learning landscape, gradually making their
way into everyday tasks and equipping our computers with``sparks of intelligence'' …

被引用次数：6 相关文章所有 3 个版本