A survey on data selection for language models

A Albalak, Y Elazar, SM Xie, S Longpre… - arXiv preprint arXiv …, 2024 - arxiv.org
A major factor in the recent success of large language models is the use of enormous and
ever-growing text datasets for unsupervised pre-training. However, naively training a model …

Leave no context behind: Efficient infinite context transformers with infini-attention

T Munkhdalai, M Faruqui, S Gopal - arXiv preprint arXiv:2404.07143, 2024 - arxiv.org
This work introduces an efficient method to scale Transformer-based Large Language
Models (LLMs) to infinitely long inputs with bounded memory and computation. A key …

Language models scale reliably with over-training and on downstream tasks

SY Gadre, G Smyrnis, V Shankar, S Gururangan… - arXiv preprint arXiv …, 2024 - arxiv.org
Scaling laws are useful guides for developing language models, but there are still gaps
between current scaling studies and how language models are ultimately trained and …

Mobillama: Towards accurate and lightweight fully transparent gpt

O Thawakar, A Vayani, S Khan, H Cholakal… - arXiv preprint arXiv …, 2024 - arxiv.org
" Bigger the better" has been the predominant trend in recent Large Language Models
(LLMs) development. However, LLMs do not suit well for scenarios that require on-device …

Materials science in the era of large language models: a perspective

G Lei, R Docherty, SJ Cooper - Digital Discovery, 2024 - pubs.rsc.org
Large Language Models (LLMs) have garnered considerable interest due to their
impressive natural language capabilities, which in conjunction with various emergent …

Zamba: A Compact 7B SSM Hybrid Model

P Glorioso, Q Anthony, Y Tokpanov… - arXiv preprint arXiv …, 2024 - arxiv.org
In this technical report, we present Zamba, a novel 7B SSM-transformer hybrid model which
achieves competitive performance against leading open-weight models at a comparable …

OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

S Mehta, MH Sekhavat, Q Cao, M Horton, Y Jin… - arXiv preprint arXiv …, 2024 - arxiv.org
The reproducibility and transparency of large language models are crucial for advancing
open research, ensuring the trustworthiness of results, and enabling investigations into data …

[HTML][HTML] TeenyTinyLlama: open-source tiny language models trained in Brazilian Portuguese

NK Corrêa, S Falk, S Fatimah, A Sen… - Machine Learning with …, 2024 - Elsevier
Large language models (LLMs) have significantly advanced natural language processing,
but their progress has yet to be equal across languages. While most LLMs are trained in …

Octopus v4: Graph of language models

W Chen, Z Li - arXiv preprint arXiv:2404.19296, 2024 - arxiv.org
Language models have been effective in a wide range of applications, yet the most
sophisticated models are often proprietary. For example, GPT-4 by OpenAI and various …

Large language models present new questions for decision support

A Handler, KR Larsen, R Hackathorn - International Journal of Information …, 2024 - Elsevier
Large language models (LLMs) have proven capable of assisting with many aspects of
organizational decision making, such as helping to collect information from databases and …