Llm-pruner: On the structural pruning of large language models

X Ma, G Fang, X Wang - Advances in neural information …, 2023 - proceedings.neurips.cc
Large language models (LLMs) have shown remarkable capabilities in language
understanding and generation. However, such impressive capability typically comes with a …

Towards efficient generative large language model serving: A survey from algorithms to systems

X Miao, G Oliaro, Z Zhang, X Cheng, H Jin… - arXiv preprint arXiv …, 2023 - arxiv.org
In the rapidly evolving landscape of artificial intelligence (AI), generative large language
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However …

Learned token pruning for transformers

S Kim, S Shen, D Thorsley, A Gholami… - Proceedings of the 28th …, 2022 - dl.acm.org
Efficient deployment of transformer models in practice is challenging due to their inference
cost including memory footprint, latency, and power consumption, which scales quadratically …

Beyond efficiency: A systematic survey of resource-efficient large language models

G Bai, Z Chai, C Ling, S Wang, J Lu, N Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
The burgeoning field of Large Language Models (LLMs), exemplified by sophisticated
models like OpenAI's ChatGPT, represents a significant advancement in artificial …

A survey on model compression and acceleration for pretrained language models

C Xu, J McAuley - Proceedings of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org
Despite achieving state-of-the-art performance on many NLP tasks, the high energy cost and
long inference delay prevent Transformer-based pretrained language models (PLMs) from …

Diet code is healthy: Simplifying programs for pre-trained models of code

Z Zhang, H Zhang, B Shen, X Gu - Proceedings of the 30th ACM Joint …, 2022 - dl.acm.org
Pre-trained code representation models such as CodeBERT have demonstrated superior
performance in a variety of software engineering tasks, yet they are often heavy in …

Dynamic neural network structure: A review for its theories and applications

J Guo, CLP Chen, Z Liu, X Yang - IEEE Transactions on Neural …, 2024 - ieeexplore.ieee.org
The dynamic neural network (DNN), in contrast to the static counterpart, offers numerous
advantages, such as improved accuracy, efficiency, and interpretability. These benefits stem …

Length-adaptive transformer: Train once with length drop, use anytime with search

G Kim, K Cho - arXiv preprint arXiv:2010.07003, 2020 - arxiv.org
Despite transformers' impressive accuracy, their computational cost is often prohibitive to
use with limited computational resources. Most previous approaches to improve inference …

Resource-efficient Algorithms and Systems of Foundation Models: A Survey

M Xu, D Cai, W Yin, S Wang, X Jin, X Liu - ACM Computing Surveys, 2024 - dl.acm.org
Large foundation models, including large language models, vision transformers, diffusion,
and LLM-based multimodal models, are revolutionizing the entire machine learning …

Transkimmer: Transformer learns to layer-wise skim

Y Guan, Z Li, J Leng, Z Lin, M Guo - arXiv preprint arXiv:2205.07324, 2022 - arxiv.org
Transformer architecture has become the de-facto model for many machine learning tasks
from natural language processing and computer vision. As such, improving its computational …