A comprehensive overview of large language models

H Naveed, AU Khan, S Qiu, M Saqib, S Anwar… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …

A survey of large language models

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arXiv preprint arXiv …, 2023 - arxiv.org
Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …

A survey on model compression for large language models

X Zhu, J Li, Y Liu, C Ma, W Wang - arXiv preprint arXiv:2308.07633, 2023 - arxiv.org
Large Language Models (LLMs) have revolutionized natural language processing tasks with
remarkable success. However, their formidable size and computational demands present …

Omniquant: Omnidirectionally calibrated quantization for large language models

W Shao, M Chen, Z Zhang, P Xu, L Zhao, Z Li… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) have revolutionized natural language processing tasks.
However, their practical deployment is hindered by their immense memory and computation …

Loftq: Lora-fine-tuning-aware quantization for large language models

Y Li, Y Yu, C Liang, P He, N Karampatziakis… - arXiv preprint arXiv …, 2023 - arxiv.org
Quantization is an indispensable technique for serving Large Language Models (LLMs) and
has recently found its way into LoRA fine-tuning. In this work we focus on the scenario where …

Qa-lora: Quantization-aware low-rank adaptation of large language models

Y Xu, L Xie, X Gu, X Chen, H Chang, H Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
Recently years have witnessed a rapid development of large language models (LLMs).
Despite the strong ability in many language-understanding tasks, the heavy computational …

A survey on transformer compression

Y Tang, Y Wang, J Guo, Z Tu, K Han, H Hu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large models based on the Transformer architecture play increasingly vital roles in artificial
intelligence, particularly within the realms of natural language processing (NLP) and …

Spikegpt: Generative pre-trained language model with spiking neural networks

RJ Zhu, Q Zhao, G Li, JK Eshraghian - arXiv preprint arXiv:2302.13939, 2023 - arxiv.org
As the size of large language models continue to scale, so does the computational
resources required to run it. Spiking Neural Networks (SNNs) have emerged as an energy …

Towards efficient generative large language model serving: A survey from algorithms to systems

X Miao, G Oliaro, Z Zhang, X Cheng, H Jin… - arXiv preprint arXiv …, 2023 - arxiv.org
In the rapidly evolving landscape of artificial intelligence (AI), generative large language
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However …

Relu strikes back: Exploiting activation sparsity in large language models

I Mirzadeh, K Alizadeh, S Mehta, CC Del Mundo… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) with billions of parameters have drastically transformed AI
applications. However, their demanding computation during inference has raised significant …