Deepseekmath: Pushing the limits of mathematical reasoning in open language models

A Dubey, A Jauhri, A Pandey, A Kadian… - arXiv preprint arXiv …, 2024 - arxiv.org

Modern artificial intelligence (AI) systems are powered by foundation models. This paper
presents a new set of foundation models, called Llama 3. It is a herd of language models …

被引用次数：53 相关文章所有 2 个版本

[PDF] arxiv.org

Llamafactory: Unified efficient fine-tuning of 100+ language models

Y Zheng, R Zhang, J Zhang, Y Ye, Z Luo - arXiv preprint arXiv:2403.13372, 2024 - arxiv.org

Efficient fine-tuning is vital for adapting large language models (LLMs) to downstream tasks.
However, it requires non-trivial efforts to implement these methods on different models. We …

被引用次数：67 相关文章所有 2 个版本

[PDF] arxiv.org

Benchmarking benchmark leakage in large language models

R Xu, Z Wang, RZ Fan, P Liu - arXiv preprint arXiv:2404.18824, 2024 - arxiv.org

Amid the expanding use of pre-training data, the phenomenon of benchmark dataset
leakage has become increasingly prominent, exacerbated by opaque training processes …

被引用次数：18 相关文章所有 2 个版本

[PDF] arxiv.org

Advancing llm reasoning generalists with preference trees

L Yuan, G Cui, H Wang, N Ding, X Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

We introduce Eurus, a suite of large language models (LLMs) optimized for reasoning.
Finetuned from Mistral-7B and CodeLlama-70B, Eurus models achieve state-of-the-art …

被引用次数：24 相关文章所有 3 个版本

[PDF] arxiv.org

Common 7b language models already possess strong math capabilities

C Li, W Wang, J Hu, Y Wei, N Zheng, H Hu… - arXiv preprint arXiv …, 2024 - arxiv.org

Mathematical capabilities were previously believed to emerge in common language models
only at a very large scale or require extensive math-related pre-training. This paper shows …

被引用次数：23 相关文章所有 2 个版本

[PDF] arxiv.org

Rho-1: Not all tokens are what you need

Z Lin, Z Gou, Y Gong, X Liu, Y Shen, R Xu, C Lin… - arXiv preprint arXiv …, 2024 - arxiv.org

Previous language model pre-training methods have uniformly applied a next-token
prediction loss to all training tokens. Challenging this norm, we posit that" Not all tokens in a …

被引用次数：16 相关文章所有 2 个版本

[PDF] arxiv.org

Compression represents intelligence linearly

Y Huang, J Zhang, Z Shan, J He - arXiv preprint arXiv:2404.09937, 2024 - arxiv.org

There is a belief that learning to compress well will lead to intelligence. Recently, language
modeling has been shown to be equivalent to compression, which offers a compelling …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

Map-neo: Highly capable and transparent bilingual large language model series

G Zhang, S Qu, J Liu, C Zhang, C Lin, CL Yu… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) have made great strides in recent years to achieve
unprecedented performance across different tasks. However, due to commercial interest, the …

被引用次数：11 相关文章所有 2 个版本

[PDF] arxiv.org

Towards modular llms by building and reusing a library of loras

O Ostapenko, Z Su, EM Ponti, L Charlin… - arXiv preprint arXiv …, 2024 - arxiv.org

The growing number of parameter-efficient adaptations of a base large language model
(LLM) calls for studying whether we can reuse such trained adapters to improve …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

Reverse training to nurse the reversal curse

O Golovneva, Z Allen-Zhu, J Weston… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) have a surprising failure: when trained on" A has a feature
B", they do not generalize to" B is a feature of A", which is termed the Reversal Curse. Even …

被引用次数：5 相关文章所有 2 个版本