A survey on data selection for language models
A major factor in the recent success of large language models is the use of enormous and
ever-growing text datasets for unsupervised pre-training. However, naively training a model …
ever-growing text datasets for unsupervised pre-training. However, naively training a model …
Leave no context behind: Efficient infinite context transformers with infini-attention
T Munkhdalai, M Faruqui, S Gopal - arXiv preprint arXiv:2404.07143, 2024 - arxiv.org
This work introduces an efficient method to scale Transformer-based Large Language
Models (LLMs) to infinitely long inputs with bounded memory and computation. A key …
Models (LLMs) to infinitely long inputs with bounded memory and computation. A key …
Language models scale reliably with over-training and on downstream tasks
Scaling laws are useful guides for developing language models, but there are still gaps
between current scaling studies and how language models are ultimately trained and …
between current scaling studies and how language models are ultimately trained and …
Mobillama: Towards accurate and lightweight fully transparent gpt
" Bigger the better" has been the predominant trend in recent Large Language Models
(LLMs) development. However, LLMs do not suit well for scenarios that require on-device …
(LLMs) development. However, LLMs do not suit well for scenarios that require on-device …
Materials science in the era of large language models: a perspective
Large Language Models (LLMs) have garnered considerable interest due to their
impressive natural language capabilities, which in conjunction with various emergent …
impressive natural language capabilities, which in conjunction with various emergent …
Zamba: A Compact 7B SSM Hybrid Model
P Glorioso, Q Anthony, Y Tokpanov… - arXiv preprint arXiv …, 2024 - arxiv.org
In this technical report, we present Zamba, a novel 7B SSM-transformer hybrid model which
achieves competitive performance against leading open-weight models at a comparable …
achieves competitive performance against leading open-weight models at a comparable …
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
The reproducibility and transparency of large language models are crucial for advancing
open research, ensuring the trustworthiness of results, and enabling investigations into data …
open research, ensuring the trustworthiness of results, and enabling investigations into data …
[HTML][HTML] TeenyTinyLlama: open-source tiny language models trained in Brazilian Portuguese
NK Corrêa, S Falk, S Fatimah, A Sen… - Machine Learning with …, 2024 - Elsevier
Large language models (LLMs) have significantly advanced natural language processing,
but their progress has yet to be equal across languages. While most LLMs are trained in …
but their progress has yet to be equal across languages. While most LLMs are trained in …
Octopus v4: Graph of language models
W Chen, Z Li - arXiv preprint arXiv:2404.19296, 2024 - arxiv.org
Language models have been effective in a wide range of applications, yet the most
sophisticated models are often proprietary. For example, GPT-4 by OpenAI and various …
sophisticated models are often proprietary. For example, GPT-4 by OpenAI and various …
Large language models present new questions for decision support
Large language models (LLMs) have proven capable of assisting with many aspects of
organizational decision making, such as helping to collect information from databases and …
organizational decision making, such as helping to collect information from databases and …