Scaling data-constrained language models

N Muennighoff, A Rush, B Barak… - Advances in …, 2023 - proceedings.neurips.cc
The current trend of scaling language models involves increasing both parameter count and
training dataset size. Extrapolating this trend suggests that training dataset size may soon be …

Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arXiv preprint arXiv …, 2024 - arxiv.org
This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

Scaling laws of synthetic images for model training... for now

L Fan, K Chen, D Krishnan, D Katabi… - Proceedings of the …, 2024 - openaccess.thecvf.com
Recent significant advances in text-to-image models unlock the possibility of training vision
systems using synthetic images potentially overcoming the difficulty of collecting curated …

Generalization on the unseen, logic reasoning and degree curriculum

E Abbe, S Bengio, A Lotfi, K Rizk - Journal of Machine Learning Research, 2024 - jmlr.org
This paper considers the learning of logical (Boolean) functions with a focus on the
generalization on the unseen (GOTU) setting, a strong case of out-of-distribution …

No train no gain: Revisiting efficient training algorithms for transformer-based language models

J Kaddour, O Key, P Nawrot… - Advances in Neural …, 2024 - proceedings.neurips.cc
The computation necessary for training Transformer-based language models has
skyrocketed in recent years. This trend has motivated research on efficient training …

Getting vit in shape: Scaling laws for compute-optimal model design

IM Alabdulmohsin, X Zhai… - Advances in Neural …, 2024 - proceedings.neurips.cc
Scaling laws have been recently employed to derive compute-optimal model size (number
of parameters) for a given compute duration. We advance and refine such methods to infer …

Arb: Advanced reasoning benchmark for large language models

T Sawada, D Paleka, A Havrilla, P Tadepalli… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) have demonstrated remarkable performance on various
quantitative reasoning and knowledge benchmarks. However, many of these benchmarks …

[PDF][PDF] The efficiency spectrum of large language models: An algorithmic survey

T Ding, T Chen, H Zhu, J Jiang, Y Zhong… - arXiv preprint arXiv …, 2023 - researchgate.net
The rapid growth of Large Language Models (LLMs) has been a driving force in
transforming various domains, reshaping the artificial general intelligence landscape …

Broken neural scaling laws

E Caballero, K Gupta, I Rish, D Krueger - arXiv preprint arXiv:2210.14891, 2022 - arxiv.org
We present a smoothly broken power law functional form (that we refer to as a Broken
Neural Scaling Law (BNSL)) that accurately models & extrapolates the scaling behaviors of …

On data scaling in masked image modeling

Z Xie, Z Zhang, Y Cao, Y Lin, Y Wei… - Proceedings of the …, 2023 - openaccess.thecvf.com
Scaling properties have been one of the central issues in self-supervised pre-training,
especially the data scalability, which has successfully motivated the large-scale self …