Scaling data-constrained language models
The current trend of scaling language models involves increasing both parameter count and
training dataset size. Extrapolating this trend suggests that training dataset size may soon be …
training dataset size. Extrapolating this trend suggests that training dataset size may soon be …
Foundational challenges in assuring alignment and safety of large language models
This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …
language models (LLMs). These challenges are organized into three different categories …
Scaling laws of synthetic images for model training... for now
Recent significant advances in text-to-image models unlock the possibility of training vision
systems using synthetic images potentially overcoming the difficulty of collecting curated …
systems using synthetic images potentially overcoming the difficulty of collecting curated …
Generalization on the unseen, logic reasoning and degree curriculum
This paper considers the learning of logical (Boolean) functions with a focus on the
generalization on the unseen (GOTU) setting, a strong case of out-of-distribution …
generalization on the unseen (GOTU) setting, a strong case of out-of-distribution …
No train no gain: Revisiting efficient training algorithms for transformer-based language models
The computation necessary for training Transformer-based language models has
skyrocketed in recent years. This trend has motivated research on efficient training …
skyrocketed in recent years. This trend has motivated research on efficient training …
Getting vit in shape: Scaling laws for compute-optimal model design
IM Alabdulmohsin, X Zhai… - Advances in Neural …, 2024 - proceedings.neurips.cc
Scaling laws have been recently employed to derive compute-optimal model size (number
of parameters) for a given compute duration. We advance and refine such methods to infer …
of parameters) for a given compute duration. We advance and refine such methods to infer …
Arb: Advanced reasoning benchmark for large language models
Large Language Models (LLMs) have demonstrated remarkable performance on various
quantitative reasoning and knowledge benchmarks. However, many of these benchmarks …
quantitative reasoning and knowledge benchmarks. However, many of these benchmarks …
[PDF][PDF] The efficiency spectrum of large language models: An algorithmic survey
The rapid growth of Large Language Models (LLMs) has been a driving force in
transforming various domains, reshaping the artificial general intelligence landscape …
transforming various domains, reshaping the artificial general intelligence landscape …
Broken neural scaling laws
We present a smoothly broken power law functional form (that we refer to as a Broken
Neural Scaling Law (BNSL)) that accurately models & extrapolates the scaling behaviors of …
Neural Scaling Law (BNSL)) that accurately models & extrapolates the scaling behaviors of …
On data scaling in masked image modeling
Scaling properties have been one of the central issues in self-supervised pre-training,
especially the data scalability, which has successfully motivated the large-scale self …
especially the data scalability, which has successfully motivated the large-scale self …