A survey on model compression for large language models
Abstract Large Language Models (LLMs) have transformed natural language processing
tasks successfully. Yet, their large size and high computational needs pose challenges for …
tasks successfully. Yet, their large size and high computational needs pose challenges for …
Efficientqat: Efficient quantization-aware training for large language models
Large language models (LLMs) are crucial in modern natural language processing and
artificial intelligence. However, they face challenges in managing their significant memory …
artificial intelligence. However, they face challenges in managing their significant memory …
Ladder: Enabling Efficient {Low-Precision} Deep Learning Computing through Hardware-aware Tensor Transformation
The increasing demand for improving deep learning model performance has led to a
paradigm shift in supporting low-precision computation to harness the robustness of deep …
paradigm shift in supporting low-precision computation to harness the robustness of deep …
A survey of low-bit large language models: Basics, systems, and algorithms
Large language models (LLMs) have achieved remarkable advancements in natural
language processing, showcasing exceptional performance across various tasks. However …
language processing, showcasing exceptional performance across various tasks. However …
Model quantization and hardware acceleration for vision transformers: A comprehensive survey
Vision Transformers (ViTs) have recently garnered considerable attention, emerging as a
promising alternative to convolutional neural networks (CNNs) in several vision-related …
promising alternative to convolutional neural networks (CNNs) in several vision-related …
Llmc: Benchmarking large language model quantization with a versatile compression toolkit
Recent advancements in large language models (LLMs) are propelling us toward artificial
general intelligence with their remarkable emergent abilities and reasoning capabilities …
general intelligence with their remarkable emergent abilities and reasoning capabilities …
Scalable MatMul-free Language Modeling
Matrix multiplication (MatMul) typically dominates the overall computational cost of large
language models (LLMs). This cost only grows as LLMs scale to larger embedding …
language models (LLMs). This cost only grows as LLMs scale to larger embedding …
Lpzero: Language model zero-cost proxy search from zero
In spite of the outstanding performance, Neural Architecture Search (NAS) is criticized for
massive computation. Recently, Zero-shot NAS has emerged as a promising approach by …
massive computation. Recently, Zero-shot NAS has emerged as a promising approach by …
STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs
In this paper, we present the first structural binarization method for LLM compression to less
than 1-bit precision. Although LLMs have achieved remarkable performance, their memory …
than 1-bit precision. Although LLMs have achieved remarkable performance, their memory …
LUT Tensor Core: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration
As large language model (LLM) inference demands ever-greater resources, there is a rapid
growing trend of using low-bit weights to shrink memory usage and boost inference …
growing trend of using low-bit weights to shrink memory usage and boost inference …