Deep double descent: Where bigger models and more data hurt

C Shorten, TM Khoshgoftaar, B Furht - Journal of big Data, 2021 - Springer

Abstract Natural Language Processing (NLP) is one of the most captivating applications of
Deep Learning. In this survey, we consider how the Data Augmentation training strategy can …

被引用次数：486 相关文章所有 15 个版本

[PDF] arxiv.org

Model complexity of deep learning: A survey

X Hu, L Chu, J Pei, W Liu, J Bian - Knowledge and Information Systems, 2021 - Springer

Abstract Model complexity is a fundamental problem in deep learning. In this paper, we
conduct a systematic overview of the latest studies on model complexity in deep learning …

被引用次数：273 相关文章所有 6 个版本

[PDF] arxiv.org

A survey of large language models

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arXiv preprint arXiv …, 2023 - arxiv.org

Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …

被引用次数：2300 相关文章所有 4 个版本

[PDF] arxiv.org

Beyond the imitation game: Quantifying and extrapolating the capabilities of language models

A Srivastava, A Rastogi, A Rao, AAM Shoeb… - arXiv preprint arXiv …, 2022 - arxiv.org

Language models demonstrate both quantitative improvement and new qualitative
capabilities with increasing scale. Despite their potentially transformative impact, these new …

被引用次数：976 相关文章所有 11 个版本

[PDF] neurips.cc

Scaling data-constrained language models

N Muennighoff, A Rush, B Barak… - Advances in …, 2024 - proceedings.neurips.cc

The current trend of scaling language models involves increasing both parameter count and
training dataset size. Extrapolating this trend suggests that training dataset size may soon be …

被引用次数：141 相关文章所有 7 个版本

[PDF] neurips.cc

Memorization without overfitting: Analyzing the training dynamics of large language models

K Tirumala, A Markosyan… - Advances in …, 2022 - proceedings.neurips.cc

Despite their wide adoption, the underlying training and memorization dynamics of very
large language models is not well understood. We empirically study exact memorization in …

被引用次数：187 相关文章所有 5 个版本

[PDF] arxiv.org

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arXiv preprint arXiv …, 2021 - arxiv.org

AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

被引用次数：3667 相关文章所有 2 个版本

[PDF] mlr.press

Towards understanding sharpness-aware minimization

M Andriushchenko… - … Conference on Machine …, 2022 - proceedings.mlr.press

Abstract Sharpness-Aware Minimization (SAM) is a recent training method that relies on
worst-case weight perturbations which significantly improves generalization in various …

被引用次数：122 相关文章所有 4 个版本

[PDF] jmlr.org

Underspecification presents challenges for credibility in modern machine learning

A D'Amour, K Heller, D Moldovan, B Adlam… - Journal of Machine …, 2022 - jmlr.org

Machine learning (ML) systems often exhibit unexpectedly poor behavior when they are
deployed in real-world domains. We identify underspecification in ML pipelines as a key …

被引用次数：742 相关文章所有 10 个版本

[PDF] pubpub.org

[PDF][PDF] The computational limits of deep learning

NC Thompson, K Greenewald, K Lee… - arXiv preprint arXiv …, 2020 - assets.pubpub.org

Deep learning's recent history has been one of achievement: from triumphing over humans
in the game of Go to world-leading performance in image classification, voice recognition …

被引用次数：663 相关文章所有 8 个版本