A survey of large language models
Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …
A simple and effective pruning approach for large language models
As their size increases, Large Languages Models (LLMs) are natural candidates for network
pruning methods: approaches that drop a subset of network weights while striving to …
pruning methods: approaches that drop a subset of network weights while striving to …
H2o: Heavy-hitter oracle for efficient generative inference of large language models
Abstract Large Language Models (LLMs), despite their recent impressive accomplishments,
are notably cost-prohibitive to deploy, particularly for applications involving long-content …
are notably cost-prohibitive to deploy, particularly for applications involving long-content …
A survey on model compression for large language models
Large Language Models (LLMs) have revolutionized natural language processing tasks with
remarkable success. However, their formidable size and computational demands present …
remarkable success. However, their formidable size and computational demands present …
Omniquant: Omnidirectionally calibrated quantization for large language models
Large language models (LLMs) have revolutionized natural language processing tasks.
However, their practical deployment is hindered by their immense memory and computation …
However, their practical deployment is hindered by their immense memory and computation …
Memory-efficient fine-tuning of compressed large language models via sub-4-bit integer quantization
Large language models (LLMs) face the challenges in fine-tuning and deployment due to
their high memory demands and computational costs. While parameter-efficient fine-tuning …
their high memory demands and computational costs. While parameter-efficient fine-tuning …
A survey on transformer compression
Large models based on the Transformer architecture play increasingly vital roles in artificial
intelligence, particularly within the realms of natural language processing (NLP) and …
intelligence, particularly within the realms of natural language processing (NLP) and …
Outlier suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling
Post-training quantization~(PTQ) of transformer language models faces significant
challenges due to the existence of detrimental outliers in activations. We observe that these …
challenges due to the existence of detrimental outliers in activations. We observe that these …
Quantizable transformers: Removing outliers by helping attention heads do nothing
Y Bondarenko, M Nagel… - Advances in Neural …, 2024 - proceedings.neurips.cc
Transformer models have been widely adopted in various domains over the last years and
especially large language models have advanced the field of AI significantly. Due to their …
especially large language models have advanced the field of AI significantly. Due to their …
Qa-lora: Quantization-aware low-rank adaptation of large language models
Recently years have witnessed a rapid development of large language models (LLMs).
Despite the strong ability in many language-understanding tasks, the heavy computational …
Despite the strong ability in many language-understanding tasks, the heavy computational …