The llama 3 herd of models

A Dubey, A Jauhri, A Pandey, A Kadian… - arXiv preprint arXiv …, 2024 - arxiv.org
Modern artificial intelligence (AI) systems are powered by foundation models. This paper
presents a new set of foundation models, called Llama 3. It is a herd of language models …

Ladder: Enabling Efficient {Low-Precision} Deep Learning Computing through Hardware-aware Tensor Transformation

L Wang, L Ma, S Cao, Q Zhang, J Xue, Y Shi… - … USENIX Symposium on …, 2024 - usenix.org
The increasing demand for improving deep learning model performance has led to a
paradigm shift in supporting low-precision computation to harness the robustness of deep …

Language models scale reliably with over-training and on downstream tasks

SY Gadre, G Smyrnis, V Shankar, S Gururangan… - arXiv preprint arXiv …, 2024 - arxiv.org
Scaling laws are useful guides for derisking expensive training runs, as they predict
performance of large models using cheaper, small-scale experiments. However, there …

Dart-math: Difficulty-aware rejection tuning for mathematical problem-solving

Y Tong, X Zhang, R Wang, R Wu, J He - arXiv preprint arXiv:2407.13690, 2024 - arxiv.org
Solving mathematical problems requires advanced reasoning abilities and presents notable
challenges for large language models. Previous works usually synthesize data from …

Smart parallel automated cryo-electron tomography

F Eisenstein, Y Fukuda, R Danev - Nature Methods, 2024 - nature.com
In situ cryo-electron tomography enables investigation of macromolecules in their native
cellular environment. Samples have become more readily available owing to recent …

Allo: A Programming Model for Composable Accelerator Design

H Chen, N Zhang, S Xiang, Z Zeng, M Dai… - Proceedings of the ACM …, 2024 - dl.acm.org
Special-purpose hardware accelerators are increasingly pivotal for sustaining performance
improvements in emerging applications, especially as the benefits of technology scaling …

[HTML][HTML] Advancing state of health estimation for electric vehicles: Transformer-based approach leveraging real-world data

K Nakano, S Vögler, K Tanaka - Advances in Applied Energy, 2024 - Elsevier
The widespread adoption of electric vehicles (EVs) underscores the urgent need for
innovative approaches to estimate their lithium-ion batteries' state of health (SOH), which is …

Eliminating position bias of language models: A mechanistic approach

Z Wang, H Zhang, X Li, KH Huang, C Han, S Ji… - arXiv preprint arXiv …, 2024 - arxiv.org
Position bias has proven to be a prevalent issue of modern language models (LMs), where
the models prioritize content based on its position within the given context. This bias often …

Liger Kernel: Efficient Triton Kernels for LLM Training

PL Hsu, Y Dai, V Kothapalli, Q Song, S Tang… - arXiv preprint arXiv …, 2024 - arxiv.org
Training Large Language Models (LLMs) efficiently at scale presents a formidable
challenge, driven by their ever-increasing computational demands and the need for …

Efficient training of large language models on distributed infrastructures: A survey

J Duan, S Zhang, Z Wang, L Jiang, W Qu, Q Hu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) like GPT and LLaMA are revolutionizing the AI industry with
their sophisticated capabilities. Training these models requires vast GPU clusters and …