Revisiting neural scaling laws in language and vision

N Muennighoff, A Rush, B Barak… - Advances in …, 2023 - proceedings.neurips.cc

The current trend of scaling language models involves increasing both parameter count and
training dataset size. Extrapolating this trend suggests that training dataset size may soon be …

被引用次数：211 相关文章所有 7 个版本

[PDF] arxiv.org

Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arXiv preprint arXiv …, 2024 - arxiv.org

This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

被引用次数：102 相关文章所有 3 个版本

[PDF] thecvf.com

Scaling laws of synthetic images for model training... for now

L Fan, K Chen, D Krishnan, D Katabi… - Proceedings of the …, 2024 - openaccess.thecvf.com

Recent significant advances in text-to-image models unlock the possibility of training vision
systems using synthetic images potentially overcoming the difficulty of collecting curated …

被引用次数：53 相关文章所有 5 个版本

[PDF] jmlr.org

Generalization on the unseen, logic reasoning and degree curriculum

E Abbe, S Bengio, A Lotfi, K Rizk - Journal of Machine Learning Research, 2024 - jmlr.org

This paper considers the learning of logical (Boolean) functions with a focus on the
generalization on the unseen (GOTU) setting, a strong case of out-of-distribution …

被引用次数：51 相关文章所有 9 个版本

[PDF] neurips.cc

No train no gain: Revisiting efficient training algorithms for transformer-based language models

J Kaddour, O Key, P Nawrot… - Advances in Neural …, 2024 - proceedings.neurips.cc

The computation necessary for training Transformer-based language models has
skyrocketed in recent years. This trend has motivated research on efficient training …

被引用次数：29 相关文章所有 5 个版本

[PDF] neurips.cc

Getting vit in shape: Scaling laws for compute-optimal model design

IM Alabdulmohsin, X Zhai… - Advances in Neural …, 2024 - proceedings.neurips.cc

Scaling laws have been recently employed to derive compute-optimal model size (number
of parameters) for a given compute duration. We advance and refine such methods to infer …

被引用次数：35 相关文章所有 5 个版本

[PDF] arxiv.org

Arb: Advanced reasoning benchmark for large language models

T Sawada, D Paleka, A Havrilla, P Tadepalli… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) have demonstrated remarkable performance on various
quantitative reasoning and knowledge benchmarks. However, many of these benchmarks …

被引用次数：53 相关文章所有 6 个版本

[PDF] researchgate.net

[PDF][PDF] The efficiency spectrum of large language models: An algorithmic survey

T Ding, T Chen, H Zhu, J Jiang, Y Zhong… - arXiv preprint arXiv …, 2023 - researchgate.net

The rapid growth of Large Language Models (LLMs) has been a driving force in
transforming various domains, reshaping the artificial general intelligence landscape …

被引用次数：17 相关文章所有 3 个版本

[PDF] arxiv.org

Broken neural scaling laws

E Caballero, K Gupta, I Rish, D Krueger - arXiv preprint arXiv:2210.14891, 2022 - arxiv.org

We present a smoothly broken power law functional form (that we refer to as a Broken
Neural Scaling Law (BNSL)) that accurately models & extrapolates the scaling behaviors of …

被引用次数：70 相关文章所有 6 个版本

[PDF] thecvf.com

On data scaling in masked image modeling

Z Xie, Z Zhang, Y Cao, Y Lin, Y Wei… - Proceedings of the …, 2023 - openaccess.thecvf.com

Scaling properties have been one of the central issues in self-supervised pre-training,
especially the data scalability, which has successfully motivated the large-scale self …

被引用次数：61 相关文章所有 7 个版本