On large-batch training for deep learning: Generalization gap and sharp minima NS Keskar, D Mudigere, J Nocedal, M Smelyanskiy, PTP Tang arXiv preprint arXiv:1609.04836, 2016 | 3521 | 2016 |
Gpt-4 technical report J Achiam, S Adler, S Agarwal, L Ahmad, I Akkaya, FL Aleman, D Almeida, ... arXiv preprint arXiv:2303.08774, 2023 | 2948* | 2023 |
Regularizing and optimizing LSTM language models S Merity, NS Keskar, R Socher arXiv preprint arXiv:1708.02182, 2017 | 1289 | 2017 |
Ctrl: A conditional transformer language model for controllable generation NS Keskar, B McCann, LR Varshney, C Xiong, R Socher arXiv preprint arXiv:1909.05858, 2019 | 1144 | 2019 |
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models A Srivastava, A Rastogi, A Rao, AAM Shoeb, A Abid, A Fisch, AR Brown, ... arXiv preprint arXiv:2206.04615, 2022 | 843 | 2022 |
The natural language decathlon: Multitask learning as question answering B McCann, NS Keskar, C Xiong, R Socher arXiv preprint arXiv:1806.08730, 2018 | 665 | 2018 |
Improving generalization performance by switching from adam to sgd NS Keskar, R Socher arXiv preprint arXiv:1712.07628, 2017 | 645 | 2017 |
Neural text summarization: A critical evaluation W Kryściński, NS Keskar, B McCann, C Xiong, R Socher arXiv preprint arXiv:1908.08960, 2019 | 388 | 2019 |
Gedi: Generative discriminator guided sequence generation B Krause, AD Gotmare, B McCann, NS Keskar, S Joty, R Socher, ... arXiv preprint arXiv:2009.06367, 2020 | 327 | 2020 |
A closer look at deep learning heuristics: Learning rate restarts, warmup and distillation A Gotmare, NS Keskar, C Xiong, R Socher arXiv preprint arXiv:1810.13243, 2018 | 297 | 2018 |
Progen: Language modeling for protein generation A Madani, B McCann, N Naik, NS Keskar, N Anand, RR Eguchi, ... arXiv preprint arXiv:2004.03497, 2020 | 249 | 2020 |
An analysis of neural language modeling at multiple scales S Merity, NS Keskar, R Socher arXiv preprint arXiv:1803.08240, 2018 | 190 | 2018 |
Deep learning-enabled breast cancer hormonal receptor status determination from base-level H&E stains N Naik, A Madani, A Esteva, NS Keskar, MF Press, D Ruderman, DB Agus, ... Nature communications 11 (1), 5727, 2020 | 186 | 2020 |
Weighted transformer network for machine translation K Ahmed, NS Keskar, R Socher arXiv preprint arXiv:1711.02132, 2017 | 161 | 2017 |
Balancing communication and computation in distributed optimization AS Berahas, R Bollapragada, NS Keskar, E Wei IEEE Transactions on Automatic Control 64 (8), 3141-3155, 2018 | 118 | 2018 |
Sequence-to-sequence prediction using a neural network model NS Keskar, K Ahmed, R Socher US Patent 11,928,600, 2024 | 109 | 2024 |
Multitask learning as question answering NS Keskar, B McCann, C Xiong, R Socher US Patent 11,501,076, 2022 | 89 | 2022 |
Multitask learning as question answering B McCann, NS Keskar, C Xiong, R Socher US Patent 10,776,581, 2020 | 84 | 2020 |
Xlda: Cross-lingual data augmentation for natural language inference and question answering J Singh, B McCann, NS Keskar, C Xiong, R Socher arXiv preprint arXiv:1905.11471, 2019 | 78 | 2019 |
Coarse-grain fine-grain coattention network for multi-evidence question answering V Zhong, C Xiong, NS Keskar, R Socher arXiv preprint arXiv:1901.00603, 2019 | 75 | 2019 |