Llm-based edge intelligence: A comprehensive survey on architectures, applications, security and trustworthiness
The integration of Large Language Models (LLMs) and Edge Intelligence (EI) introduces a
groundbreaking paradigm for intelligent edge devices. With their capacity for human-like …
groundbreaking paradigm for intelligent edge devices. With their capacity for human-like …
Llamafactory: Unified efficient fine-tuning of 100+ language models
Efficient fine-tuning is vital for adapting large language models (LLMs) to downstream tasks.
However, it requires non-trivial efforts to implement these methods on different models. We …
However, it requires non-trivial efforts to implement these methods on different models. We …
Squeezellm: Dense-and-sparse quantization
Generative Large Language Models (LLMs) have demonstrated remarkable results for a
wide range of tasks. However, deploying these models for inference has been a significant …
wide range of tasks. However, deploying these models for inference has been a significant …
Quarot: Outlier-free 4-bit inference in rotated llms
We introduce QuaRot, a new Quantization scheme based on Rotations, which is able to
quantize LLMs end-to-end, including all weights, activations, and KV cache in 4 bits. QuaRot …
quantize LLMs end-to-end, including all weights, activations, and KV cache in 4 bits. QuaRot …
Efficientqat: Efficient quantization-aware training for large language models
Large language models (LLMs) are crucial in modern natural language processing and
artificial intelligence. However, they face challenges in managing their significant memory …
artificial intelligence. However, they face challenges in managing their significant memory …
A survey of low-bit large language models: Basics, systems, and algorithms
Large language models (LLMs) have achieved remarkable advancements in natural
language processing, showcasing exceptional performance across various tasks. However …
language processing, showcasing exceptional performance across various tasks. However …
Marlin: Mixed-precision auto-regressive parallel inference on large language models
As inference on Large Language Models (LLMs) emerges as an important workload in
machine learning applications, weight quantization has become a standard technique for …
machine learning applications, weight quantization has become a standard technique for …
Fast matrix multiplications for lookup table-quantized llms
H Guo, W Brandon, R Cholakov… - arXiv preprint arXiv …, 2024 - arxiv.org
The deployment of large language models (LLMs) is often constrained by memory
bandwidth, where the primary bottleneck is the cost of transferring model parameters from …
bandwidth, where the primary bottleneck is the cost of transferring model parameters from …
Vptq: Extreme low-bit vector post-training quantization for large language models
Scaling model size significantly challenges the deployment and inference of Large
Language Models (LLMs). Due to the redundancy in LLM weights, recent research has …
Language Models (LLMs). Due to the redundancy in LLM weights, recent research has …
Llmc: Benchmarking large language model quantization with a versatile compression toolkit
Recent advancements in large language models (LLMs) are propelling us toward artificial
general intelligence with their remarkable emergent abilities and reasoning capabilities …
general intelligence with their remarkable emergent abilities and reasoning capabilities …