Llm-based edge intelligence: A comprehensive survey on architectures, applications, security and trustworthiness

O Friha, MA Ferrag, B Kantarci… - IEEE Open Journal …, 2024 - ieeexplore.ieee.org
The integration of Large Language Models (LLMs) and Edge Intelligence (EI) introduces a
groundbreaking paradigm for intelligent edge devices. With their capacity for human-like …

Llamafactory: Unified efficient fine-tuning of 100+ language models

Y Zheng, R Zhang, J Zhang, Y Ye, Z Luo… - arXiv preprint arXiv …, 2024 - arxiv.org
Efficient fine-tuning is vital for adapting large language models (LLMs) to downstream tasks.
However, it requires non-trivial efforts to implement these methods on different models. We …

Squeezellm: Dense-and-sparse quantization

S Kim, C Hooper, A Gholami, Z Dong, X Li… - arXiv preprint arXiv …, 2023 - arxiv.org
Generative Large Language Models (LLMs) have demonstrated remarkable results for a
wide range of tasks. However, deploying these models for inference has been a significant …

Quarot: Outlier-free 4-bit inference in rotated llms

S Ashkboos, A Mohtashami, ML Croci, B Li… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce QuaRot, a new Quantization scheme based on Rotations, which is able to
quantize LLMs end-to-end, including all weights, activations, and KV cache in 4 bits. QuaRot …

Efficientqat: Efficient quantization-aware training for large language models

M Chen, W Shao, P Xu, J Wang, P Gao… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) are crucial in modern natural language processing and
artificial intelligence. However, they face challenges in managing their significant memory …

A survey of low-bit large language models: Basics, systems, and algorithms

R Gong, Y Ding, Z Wang, C Lv, X Zheng, J Du… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have achieved remarkable advancements in natural
language processing, showcasing exceptional performance across various tasks. However …

Marlin: Mixed-precision auto-regressive parallel inference on large language models

E Frantar, RL Castro, J Chen, T Hoefler… - arXiv preprint arXiv …, 2024 - arxiv.org
As inference on Large Language Models (LLMs) emerges as an important workload in
machine learning applications, weight quantization has become a standard technique for …

Fast matrix multiplications for lookup table-quantized llms

H Guo, W Brandon, R Cholakov… - arXiv preprint arXiv …, 2024 - arxiv.org
The deployment of large language models (LLMs) is often constrained by memory
bandwidth, where the primary bottleneck is the cost of transferring model parameters from …

Vptq: Extreme low-bit vector post-training quantization for large language models

Y Liu, J Wen, Y Wang, S Ye, LL Zhang, T Cao… - arXiv preprint arXiv …, 2024 - arxiv.org
Scaling model size significantly challenges the deployment and inference of Large
Language Models (LLMs). Due to the redundancy in LLM weights, recent research has …

Llmc: Benchmarking large language model quantization with a versatile compression toolkit

R Gong, Y Yong, S Gu, Y Huang, C Lv… - Proceedings of the …, 2024 - aclanthology.org
Recent advancements in large language models (LLMs) are propelling us toward artificial
general intelligence with their remarkable emergent abilities and reasoning capabilities …