The llama 3 herd of models

A Dubey, A Jauhri, A Pandey, A Kadian… - arXiv preprint arXiv …, 2024 - arxiv.org
Modern artificial intelligence (AI) systems are powered by foundation models. This paper
presents a new set of foundation models, called Llama 3. It is a herd of language models …

Llamafactory: Unified efficient fine-tuning of 100+ language models

Y Zheng, R Zhang, J Zhang, Y Ye, Z Luo - arXiv preprint arXiv:2403.13372, 2024 - arxiv.org
Efficient fine-tuning is vital for adapting large language models (LLMs) to downstream tasks.
However, it requires non-trivial efforts to implement these methods on different models. We …

Benchmarking benchmark leakage in large language models

R Xu, Z Wang, RZ Fan, P Liu - arXiv preprint arXiv:2404.18824, 2024 - arxiv.org
Amid the expanding use of pre-training data, the phenomenon of benchmark dataset
leakage has become increasingly prominent, exacerbated by opaque training processes …

Advancing llm reasoning generalists with preference trees

L Yuan, G Cui, H Wang, N Ding, X Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce Eurus, a suite of large language models (LLMs) optimized for reasoning.
Finetuned from Mistral-7B and CodeLlama-70B, Eurus models achieve state-of-the-art …

Common 7b language models already possess strong math capabilities

C Li, W Wang, J Hu, Y Wei, N Zheng, H Hu… - arXiv preprint arXiv …, 2024 - arxiv.org
Mathematical capabilities were previously believed to emerge in common language models
only at a very large scale or require extensive math-related pre-training. This paper shows …

Rho-1: Not all tokens are what you need

Z Lin, Z Gou, Y Gong, X Liu, Y Shen, R Xu, C Lin… - arXiv preprint arXiv …, 2024 - arxiv.org
Previous language model pre-training methods have uniformly applied a next-token
prediction loss to all training tokens. Challenging this norm, we posit that" Not all tokens in a …

Compression represents intelligence linearly

Y Huang, J Zhang, Z Shan, J He - arXiv preprint arXiv:2404.09937, 2024 - arxiv.org
There is a belief that learning to compress well will lead to intelligence. Recently, language
modeling has been shown to be equivalent to compression, which offers a compelling …

Map-neo: Highly capable and transparent bilingual large language model series

G Zhang, S Qu, J Liu, C Zhang, C Lin, CL Yu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) have made great strides in recent years to achieve
unprecedented performance across different tasks. However, due to commercial interest, the …

Towards modular llms by building and reusing a library of loras

O Ostapenko, Z Su, EM Ponti, L Charlin… - arXiv preprint arXiv …, 2024 - arxiv.org
The growing number of parameter-efficient adaptations of a base large language model
(LLM) calls for studying whether we can reuse such trained adapters to improve …

Reverse training to nurse the reversal curse

O Golovneva, Z Allen-Zhu, J Weston… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have a surprising failure: when trained on" A has a feature
B", they do not generalize to" B is a feature of A", which is termed the Reversal Curse. Even …