A survey of low-bit large language models: Basics, systems, and algorithms

R Gong, Y Ding, Z Wang, C Lv, X Zheng, J Du… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have achieved remarkable advancements in natural
language processing, showcasing exceptional performance across various tasks. However …

Extreme compression of large language models via additive quantization

V Egiazarian, A Panferov, D Kuznedelev… - arXiv preprint arXiv …, 2024 - arxiv.org
The emergence of accurate open large language models (LLMs) has led to a race towards
quantization techniques for such models enabling execution on end-user devices. In this …

[PDF][PDF] A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs …

F Wang, Z Zhang, X Zhang, Z Wu, T Mo, Q Lu… - arXiv preprint arXiv …, 2024 - ai.radensa.ru
Large language models (LLM) have demonstrated emergent abilities in text generation,
question answering, and reasoning, facilitating various tasks and domains. Despite their …

Inference optimization of foundation models on ai accelerators

Y Park, K Budhathoki, L Chen, JM Kübler… - Proceedings of the 30th …, 2024 - dl.acm.org
Powerful foundation models, including large language models (LLMs), with Transformer
architectures have ushered in a new era of Generative AI across various industries. Industry …

Spectra: Surprising effectiveness of pretraining ternary language models at scale

A Kaushal, T Vaidhya, AK Mondal, T Pandey… - arXiv preprint arXiv …, 2024 - arxiv.org
Rapid advancements in GPU computational power has outpaced memory capacity and
bandwidth growth, creating bottlenecks in Large Language Model (LLM) inference. Post …

AQUATIC-Diff: Additive Quantization for Truly Tiny Compressed Diffusion Models

A Hasan, T Peyrin - openreview.net
Tremendous investments have been made towards the commodification of diffusion models
for generation of diverse media. Their mass-market adoption is however still hobbled by the …