The efficiency spectrum of large language models: An algorithmic survey

M Xu, W Yin, D Cai, R Yi, D Xu, Q Wang, B Wu… - arXiv preprint arXiv …, 2024 - arxiv.org

Large foundation models, including large language models (LLMs), vision transformers
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …

被引用次数：75 相关文章所有 3 个版本

[PDF] acm.org

Resource-efficient Algorithms and Systems of Foundation Models: A Survey

M Xu, D Cai, W Yin, S Wang, X Jin, X Liu - ACM Computing Surveys, 2024 - dl.acm.org

Large foundation models, including large language models, vision transformers, diffusion,
and LLM-based multimodal models, are revolutionizing the entire machine learning …

A survey on efficient inference for large language models

Z Zhou, X Ning, K Hong, T Fu, J Xu, S Li, Y Lou… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) have attracted extensive attention due to their remarkable
performance across various tasks. However, the substantial computational and memory …

被引用次数：67 相关文章所有 5 个版本

[PDF] arxiv.org

PaCE: Parsimonious Concept Engineering for Large Language Models

J Luo, T Ding, KHR Chan, D Thaker… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) are being used for a wide variety of tasks. While they are
capable of generating human-like responses, they can also produce undesirable output …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

OTOv3: Automatic Architecture-Agnostic Neural Network Training and Compression from Structured Pruning to Erasing Operators

T Chen, T Ding, Z Zhu, Z Chen, HT Wu… - arXiv preprint arXiv …, 2023 - arxiv.org

Compressing a predefined deep neural network (DNN) into a compact sub-network with
competitive performance is crucial in the efficient machine learning realm. This topic spans …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

HESSO: Towards Automatic Efficient and User Friendly Any Neural Network Training and Pruning

T Chen, X Qu, D Aponte, C Banbury, J Ko… - arXiv preprint arXiv …, 2024 - arxiv.org

Structured pruning is one of the most popular approaches to effectively compress the heavy
deep neural networks (DNNs) into compact sub-networks while retaining performance. The …

A Survey on Large Language Model Acceleration based on KV Cache Management

H Li, Y Li, A Tian, T Tang, Z Xu, X Chen, N Hu… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) have revolutionized a wide range of domains such as
natural language processing, computer vision, and multi-modal tasks due to their ability to …

Arlo: Serving Transformer-based Language Models with Dynamic Input Lengths

X Tan, J Li, Y Yang, J Li, H Xu - … of the 53rd International Conference on …, 2024 - dl.acm.org

A prominent challenge in serving requests for NLP tasks is handling the varying length of
input texts. Existing solutions, such as uniform zero-padding and compiler support, suffer …

A Text Classification Model Combining Adversarial Training with Pre-trained Language Model and neural networks: A Case Study on Telecom Fraud Incident Texts

L Zhuoxian, S Tuo, H Xiaofeng - arXiv preprint arXiv:2411.06772, 2024 - arxiv.org

Front-line police officers often categorize all police call reported cases of Telecom Fraud into
14 subcategories to facilitate targeted prevention measures, such as precise public …

Efficiency in Language Understanding and Generation: An Evaluation of Four Open-Source Large Language Models

SM Wong, H Leung, KY Wong - 2024 - researchsquare.com

This study provides a comprehensive evaluation of the efficiency of Large Language Models
(LLMs) in performing diverse language understanding and generation tasks. Through a …

被引用次数：39 相关文章所有 2 个版本