Llm-pruner: On the structural pruning of large language models
Large language models (LLMs) have shown remarkable capabilities in language
understanding and generation. However, such impressive capability typically comes with a …
understanding and generation. However, such impressive capability typically comes with a …
Towards efficient generative large language model serving: A survey from algorithms to systems
In the rapidly evolving landscape of artificial intelligence (AI), generative large language
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However …
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However …
Learned token pruning for transformers
Efficient deployment of transformer models in practice is challenging due to their inference
cost including memory footprint, latency, and power consumption, which scales quadratically …
cost including memory footprint, latency, and power consumption, which scales quadratically …
Beyond efficiency: A systematic survey of resource-efficient large language models
The burgeoning field of Large Language Models (LLMs), exemplified by sophisticated
models like OpenAI's ChatGPT, represents a significant advancement in artificial …
models like OpenAI's ChatGPT, represents a significant advancement in artificial …
A survey on model compression and acceleration for pretrained language models
Despite achieving state-of-the-art performance on many NLP tasks, the high energy cost and
long inference delay prevent Transformer-based pretrained language models (PLMs) from …
long inference delay prevent Transformer-based pretrained language models (PLMs) from …
Diet code is healthy: Simplifying programs for pre-trained models of code
Pre-trained code representation models such as CodeBERT have demonstrated superior
performance in a variety of software engineering tasks, yet they are often heavy in …
performance in a variety of software engineering tasks, yet they are often heavy in …
Dynamic neural network structure: A review for its theories and applications
The dynamic neural network (DNN), in contrast to the static counterpart, offers numerous
advantages, such as improved accuracy, efficiency, and interpretability. These benefits stem …
advantages, such as improved accuracy, efficiency, and interpretability. These benefits stem …
Length-adaptive transformer: Train once with length drop, use anytime with search
Despite transformers' impressive accuracy, their computational cost is often prohibitive to
use with limited computational resources. Most previous approaches to improve inference …
use with limited computational resources. Most previous approaches to improve inference …
Resource-efficient Algorithms and Systems of Foundation Models: A Survey
Large foundation models, including large language models, vision transformers, diffusion,
and LLM-based multimodal models, are revolutionizing the entire machine learning …
and LLM-based multimodal models, are revolutionizing the entire machine learning …
Transkimmer: Transformer learns to layer-wise skim
Transformer architecture has become the de-facto model for many machine learning tasks
from natural language processing and computer vision. As such, improving its computational …
from natural language processing and computer vision. As such, improving its computational …