Bitdistiller: Unleashing the potential of sub-4-bit llms via self-distillation

X Zhu, J Li, Y Liu, C Ma, W Wang - Transactions of the Association for …, 2024 - direct.mit.edu

Abstract Large Language Models (LLMs) have transformed natural language processing
tasks successfully. Yet, their large size and high computational needs pose challenges for …

被引用次数：212 相关文章所有 2 个版本

[PDF] arxiv.org

Efficientqat: Efficient quantization-aware training for large language models

M Chen, W Shao, P Xu, J Wang, P Gao… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) are crucial in modern natural language processing and
artificial intelligence. However, they face challenges in managing their significant memory …

被引用次数：21 相关文章所有 3 个版本

[PDF] usenix.org

Ladder: Enabling Efficient {Low-Precision} Deep Learning Computing through Hardware-aware Tensor Transformation

L Wang, L Ma, S Cao, Q Zhang, J Xue, Y Shi… - … USENIX Symposium on …, 2024 - usenix.org

The increasing demand for improving deep learning model performance has led to a
paradigm shift in supporting low-precision computation to harness the robustness of deep …

被引用次数：6 相关文章

[PDF] arxiv.org

A survey of low-bit large language models: Basics, systems, and algorithms

R Gong, Y Ding, Z Wang, C Lv, X Zheng, J Du… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) have achieved remarkable advancements in natural
language processing, showcasing exceptional performance across various tasks. However …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Model quantization and hardware acceleration for vision transformers: A comprehensive survey

D Du, G Gong, X Chu - arXiv preprint arXiv:2405.00314, 2024 - arxiv.org

Vision Transformers (ViTs) have recently garnered considerable attention, emerging as a
promising alternative to convolutional neural networks (CNNs) in several vision-related …

被引用次数：4 相关文章所有 2 个版本

[PDF] aclanthology.org

Llmc: Benchmarking large language model quantization with a versatile compression toolkit

R Gong, Y Yong, S Gu, Y Huang, C Lv… - Proceedings of the …, 2024 - aclanthology.org

Recent advancements in large language models (LLMs) are propelling us toward artificial
general intelligence with their remarkable emergent abilities and reasoning capabilities …

被引用次数：4 相关文章

[PDF] arxiv.org

Scalable MatMul-free Language Modeling

RJ Zhu, Y Zhang, E Sifferman, T Sheaves… - arXiv preprint arXiv …, 2024 - arxiv.org

Matrix multiplication (MatMul) typically dominates the overall computational cost of large
language models (LLMs). This cost only grows as LLMs scale to larger embedding …

被引用次数：13 相关文章所有 2 个版本

[PDF] arxiv.org

Lpzero: Language model zero-cost proxy search from zero

P Dong, L Li, X Liu, Z Tang, X Liu, Q Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

In spite of the outstanding performance, Neural Architecture Search (NAS) is criticized for
massive computation. Recently, Zero-shot NAS has emerged as a promising approach by …

被引用次数：1 相关文章所有 4 个版本

[PDF] arxiv.org

STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs

P Dong, L Li, Y Zhong, D Du, R Fan, Y Chen… - arXiv preprint arXiv …, 2024 - arxiv.org

In this paper, we present the first structural binarization method for LLM compression to less
than 1-bit precision. Although LLMs have achieved remarkable performance, their memory …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

LUT Tensor Core: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration

Z Mo, L Wang, J Wei, Z Zeng, S Cao, L Ma… - arXiv preprint arXiv …, 2024 - arxiv.org

As large language model (LLM) inference demands ever-greater resources, there is a rapid
growing trend of using low-bit weights to shrink memory usage and boost inference …

被引用次数：1 相关文章所有 2 个版本