On the importance of pre-training data volume for compact language models

V Micheli, M d'Hoffschmidt, F Fleuret - arXiv preprint arXiv:2010.03813, 2020 - arxiv.org
Recent advances in language modeling have led to computationally intensive and resource-
demanding state-of-the-art models. In an effort towards sustainable practices, we study the …

Bayling: Bridging cross-lingual alignment and instruction following through interactive translation for large language models

S Zhang, Q Fang, Z Zhang, Z Ma, Y Zhou… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) have demonstrated remarkable prowess in language
understanding and generation. Advancing from foundation LLMs to instructionfollowing …

h2ogpt: Democratizing large language models

A Candel, J McKinney, P Singer, P Pfeiffer… - arXiv preprint arXiv …, 2023 - arxiv.org
Foundation Large Language Models (LLMs) such as GPT-4 represent a revolution in AI due
to their real-world applications though natural language processing. However, they also …

Knowledge fusion of large language models

F Wan, X Huang, D Cai, X Quan, W Bi, S Shi - arXiv preprint arXiv …, 2024 - arxiv.org
While training large language models (LLMs) from scratch can generate models with distinct
functionalities and strengths, it comes at significant costs and may result in redundant …

Large language models suffer from their own output: An analysis of the self-consuming training loop

M Briesch, D Sobania, F Rothlauf - arXiv preprint arXiv:2311.16822, 2023 - arxiv.org
Large language models (LLM) have become state of the art in many benchmarks and
conversational LLM applications like ChatGPT are now widely used by the public. Those …

Typhoon: Thai large language models

K Pipatanakul, P Jirabovonvisut, P Manakul… - arXiv preprint arXiv …, 2023 - arxiv.org
Typhoon is a series of Thai large language models (LLMs) developed specifically for the
Thai language. This technical report presents challenges and insights in developing Thai …

Are larger pretrained language models uniformly better? comparing performance at the instance level

R Zhong, D Ghosh, D Klein, J Steinhardt - arXiv preprint arXiv:2105.06020, 2021 - arxiv.org
Larger language models have higher accuracy on average, but are they better on every
single instance (datapoint)? Some work suggests larger models have higher out-of …

Hypertuning: Toward adapting large language models without back-propagation

J Phang, Y Mao, P He, W Chen - … Conference on Machine …, 2023 - proceedings.mlr.press
Fine-tuning large language models for different tasks can be costly and inefficient, and even
methods that reduce the number of tuned parameters still require full gradient-based …

Can we trust the evaluation on ChatGPT?

R Aiyappa, J An, H Kwak, YY Ahn - arXiv preprint arXiv:2303.12767, 2023 - arxiv.org
ChatGPT, the first large language model (LLM) with mass adoption, has demonstrated
remarkable performance in numerous natural language tasks. Despite its evident …

Dolma: An open corpus of three trillion tokens for language model pretraining research

L Soldaini, R Kinney, A Bhagia, D Schwenk… - arXiv preprint arXiv …, 2024 - arxiv.org
Language models have become a critical technology to tackling a wide range of natural
language processing tasks, yet many details about how the best-performing language …