Llm-qat: Data-free quantization aware training for large language models

H Naveed, AU Khan, S Qiu, M Saqib, S Anwar… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …

被引用次数：360 相关文章所有 3 个版本

[PDF] arxiv.org

A survey of large language models

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arXiv preprint arXiv …, 2023 - arxiv.org

Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …

被引用次数：2387 相关文章所有 4 个版本

[PDF] arxiv.org

A survey on model compression for large language models

X Zhu, J Li, Y Liu, C Ma, W Wang - arXiv preprint arXiv:2308.07633, 2023 - arxiv.org

Large Language Models (LLMs) have revolutionized natural language processing tasks with
remarkable success. However, their formidable size and computational demands present …

被引用次数：147 相关文章所有 2 个版本

[PDF] arxiv.org

Omniquant: Omnidirectionally calibrated quantization for large language models

W Shao, M Chen, Z Zhang, P Xu, L Zhao, Z Li… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models (LLMs) have revolutionized natural language processing tasks.
However, their practical deployment is hindered by their immense memory and computation …

被引用次数：88 相关文章所有 3 个版本

[PDF] arxiv.org

Loftq: Lora-fine-tuning-aware quantization for large language models

Y Li, Y Yu, C Liang, P He, N Karampatziakis… - arXiv preprint arXiv …, 2023 - arxiv.org

Quantization is an indispensable technique for serving Large Language Models (LLMs) and
has recently found its way into LoRA fine-tuning. In this work we focus on the scenario where …

被引用次数：76 相关文章所有 4 个版本

[PDF] arxiv.org

Qa-lora: Quantization-aware low-rank adaptation of large language models

Y Xu, L Xie, X Gu, X Chen, H Chang, H Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

Recently years have witnessed a rapid development of large language models (LLMs).
Despite the strong ability in many language-understanding tasks, the heavy computational …

被引用次数：69 相关文章所有 3 个版本

[PDF] arxiv.org

A survey on transformer compression

Y Tang, Y Wang, J Guo, Z Tu, K Han, H Hu… - arXiv preprint arXiv …, 2024 - arxiv.org

Large models based on the Transformer architecture play increasingly vital roles in artificial
intelligence, particularly within the realms of natural language processing (NLP) and …

被引用次数：16 相关文章所有 2 个版本

[PDF] arxiv.org

Spikegpt: Generative pre-trained language model with spiking neural networks

RJ Zhu, Q Zhao, G Li, JK Eshraghian - arXiv preprint arXiv:2302.13939, 2023 - arxiv.org

As the size of large language models continue to scale, so does the computational
resources required to run it. Spiking Neural Networks (SNNs) have emerged as an energy …

被引用次数：87 相关文章所有 3 个版本

[PDF] arxiv.org

Towards efficient generative large language model serving: A survey from algorithms to systems

X Miao, G Oliaro, Z Zhang, X Cheng, H Jin… - arXiv preprint arXiv …, 2023 - arxiv.org

In the rapidly evolving landscape of artificial intelligence (AI), generative large language
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However …

被引用次数：42 相关文章所有 2 个版本

[PDF] arxiv.org

Relu strikes back: Exploiting activation sparsity in large language models

I Mirzadeh, K Alizadeh, S Mehta, CC Del Mundo… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) with billions of parameters have drastically transformed AI
applications. However, their demanding computation during inference has raised significant …

被引用次数：38 相关文章所有 4 个版本