Fp8-lm: Training fp8 large language models

Ladder: Enabling Efficient {Low-Precision} Deep Learning Computing through Hardware-aware Tensor Transformation

L Wang, L Ma, S Cao, Q Zhang, J Xue, Y Shi… - … USENIX Symposium on …, 2024 - usenix.org

The increasing demand for improving deep learning model performance has led to a
paradigm shift in supporting low-precision computation to harness the robustness of deep …

被引用次数：2 相关文章

[PDF] arxiv.org

Numerical Accuracy Matters: Applications of Machine Learned Potential Energy Surfaces

S Käser, M Meuwly - The Journal of Physical Chemistry Letters, 2024 - ACS Publications

The role of numerical accuracy in training and evaluating neural network-based potential
energy surfaces is examined for different experimental observables. For observables that …

被引用次数：2 相关文章所有 4 个版本

[PDF] ieee.org

Achieving Peak Performance for Large Language Models: A Systematic Review

ZRK Rostam, S Szénási, G Kertész - IEEE Access, 2024 - ieeexplore.ieee.org

In recent years, large language models (LLMs) have achieved remarkable success in
natural language processing (NLP). LLMs require an extreme amount of parameters to …

Accurate Block Quantization in LLMs with Outliers

N Trukhanov, I Soloveychik - arXiv preprint arXiv:2403.20137, 2024 - arxiv.org

The demand for inference on extremely large scale LLMs has seen enormous growth in the
recent months. It made evident the colossal shortage of dedicated hardware capable of …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

ScalingFilter: Assessing Data Quality through Inverse Utilization of Scaling Laws

R Li, Y Wei, M Zhang, N Yu, H Hu, H Peng - arXiv preprint arXiv …, 2024 - arxiv.org

High-quality data is crucial for the pre-training performance of large language models.
Unfortunately, existing quality filtering methods rely on a known high-quality dataset as …

相关文章所有 2 个版本

[PDF] arxiv.org

Efficient Training of Large Language Models on Distributed Infrastructures: A Survey

J Duan, S Zhang, Z Wang, L Jiang, W Qu, Q Hu… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) like GPT and LLaMA are revolutionizing the AI industry with
their sophisticated capabilities. Training these models requires vast GPU clusters and …

Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization

H Xi, Y Chen, K Zhao, K Zheng, J Chen… - arXiv preprint arXiv …, 2024 - arxiv.org

Pretraining transformers are generally time-consuming. Fully quantized training (FQT) is a
promising approach to speed up pretraining. However, most FQT methods adopt a quantize …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

Exploring Quantization for Efficient Pre-Training of Transformer Language Models

K Chitsaz, Q Fournier, G Mordido… - arXiv preprint arXiv …, 2024 - arxiv.org

The increasing scale of Transformer models has led to an increase in their pre-training
computational requirements. While quantization has proven to be effective after pre-training …

相关文章所有 2 个版本

[PDF] arxiv.org

LoCo: Low-Bit Communication Adaptor for Large-scale Model Training

X Xie, Z Lin, KC Toh, P Zhou - arXiv preprint arXiv:2407.04480, 2024 - arxiv.org

To efficiently train large-scale models, low-bit gradient communication compresses full-
precision gradients on local GPU nodes into low-precision ones for higher gradient …

相关文章所有 2 个版本

[PDF] arxiv.org

Towards Federated Learning with On-device Training and Communication in 8-bit Floating Point

B Wang, A Berg, DAE Acar, C Zhou - arXiv preprint arXiv:2407.02610, 2024 - arxiv.org

Recent work has shown that 8-bit floating point (FP8) can be used for efficiently training
neural networks with reduced computational overhead compared to training in FP32/FP16 …

相关文章所有 2 个版本