Afpq: Asymmetric floating point quantization for llms

文章

学术资源搜索

获得 4 条结果（用时0.02秒）

我的图书馆

Afpq: Asymmetric floating point quantization for llms

在引用文章中搜索

[PDF] usenix.org

Ladder: Enabling Efficient {Low-Precision} Deep Learning Computing through Hardware-aware Tensor Transformation

L Wang, L Ma, S Cao, Q Zhang, J Xue, Y Shi… - … USENIX Symposium on …, 2024 - usenix.org

The increasing demand for improving deep learning model performance has led to a
paradigm shift in supporting low-precision computation to harness the robustness of deep …

被引用次数：6 相关文章

[PDF] arxiv.org

Bitdistiller: Unleashing the potential of sub-4-bit llms via self-distillation

D Du, Y Zhang, S Cao, J Guo, T Cao, X Chu… - arXiv preprint arXiv …, 2024 - arxiv.org

The upscaling of Large Language Models (LLMs) has yielded impressive advances in
natural language processing, yet it also poses significant deployment challenges. Weight …

被引用次数：12 相关文章所有 2 个版本

[PDF] arxiv.org

Hq-dit: Efficient diffusion transformer with fp4 hybrid quantization

W Liu, SQ Zhang - arXiv preprint arXiv:2405.19751, 2024 - arxiv.org

Diffusion Transformers (DiTs) have recently gained substantial attention in both industrial
and academic fields for their superior visual generation capabilities, outperforming …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

DAQ: Density-Aware Post-Training Weight-Only Quantization For LLMs

Y Luo, L Chen - arXiv preprint arXiv:2410.12187, 2024 - arxiv.org

Large language models (LLMs) excel in various tasks but face deployment challenges due
to hardware constraints. We propose density-aware post-training weight-only quantization …