Deep Neural Networks as Gaussian Processes J Lee*, Y Bahri*, R Novak, SS Schoenholz, J Pennington, ... International Conference on Learning Representations (ICLR), 2018 | 1238 | 2018 |
Wide neural networks of any depth evolve as linear models under gradient descent J Lee*, L Xiao*, SS Schoenholz, Y Bahri, J Sohl-Dickstein, J Pennington Neural Information Processing Systems (NeurIPS), 2019 | 1059 | 2019 |
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models A Srivastava, A Rastogi, A Rao, AAM Shoeb, A Abid, A Fisch, AR Brown, ... TMLR 2023, 2022 | 883 | 2022 |
Measuring the effects of data parallelism on neural network training CJ Shallue*, J Lee*, J Antognini, J Sohl-Dickstein, R Frostig, GE Dahl Journal of Machine Learning Research (2019) 20, 1-49, 2019 | 427 | 2019 |
Bayesian Deep Convolutional Neural Networks with Many Channels are Gaussian Processes R Novak*, L Xiao*, J Lee, Y Bahri, G Yang, D Abolafia, J Pennington, ... International Conference on Learning Representations (ICLR), 2019 | 366* | 2019 |
On empirical comparisons of optimizers for deep learning D Choi, CJ Shallue, Z Nado, J Lee, CJ Maddison, GE Dahl arXiv preprint arXiv:1910.05446, 2019 | 348 | 2019 |
Neural tangents: Fast and easy infinite neural networks in python R Novak*, L Xiao*, J Hron, J Lee, AA Alemi, J Sohl-Dickstein, ... International Conference on Learning Representations (ICLR), Spotlight, 2020 | 252 | 2020 |
Dataset Distillation with Infinitely Wide Convolutional Networks T Nguyen, R Novak, L Xiao, J Lee Neural Information Processing Systems (NeurIPS), 2021 | 206 | 2021 |
Finite versus infinite neural networks: an empirical study J Lee, SS Schoenholz, J Pennington, B Adlam, L Xiao, R Novak, ... Neural Information Processing Systems (NeurIPS), Spotlight, 2020 | 202 | 2020 |
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context M Reid, N Savinov, D Teplyashin, D Lepikhin, T Lillicrap, J Alayrac, ... arXiv preprint arXiv:2403.05530, 2024 | 197 | 2024 |
Dataset Meta-Learning from Kernel Ridge-Regression T Nguyen, Z Chen, J Lee International Conference on Learning Representations (ICLR), 2021 | 197 | 2021 |
The superconformal bootstrap in three dimensions SM Chester, J Lee, SS Pufu, R Yacoby Journal of High Energy Physics 2014 (9), 1-59, 2014 | 166 | 2014 |
Explaining neural scaling laws Y Bahri*, E Dyer*, J Kaplan*, J Lee*, U Sharma* arXiv preprint arXiv:2102.06701, 2021 | 165 | 2021 |
Exact correlators of BPS operators from the 3d superconformal bootstrap SM Chester, J Lee, SS Pufu, R Yacoby Journal of High Energy Physics 2015 (3), 1-55, 2015 | 149 | 2015 |
On the infinite width limit of neural networks with a standard parameterization J Sohl-Dickstein, R Novak, SS Schoenholz, J Lee arXiv preprint arXiv:2001.07301, 2020 | 52 | 2020 |
Algebra of Majorana doubling J Lee, F Wilczek Physical Review Letters 111 (22), 226402, 2013 | 37 | 2013 |
Beyond human data: Scaling self-training for problem-solving with language models A Singh, JD Co-Reyes, R Agarwal, A Anand, P Patil, PJ Liu, J Harrison, ... arXiv preprint arXiv:2312.06585, 2023 | 35 | 2023 |
Towards NNGP-guided Neural Architecture Search DS Park*, J Lee*, D Peng, Y Cao, J Sohl-Dickstein arXiv preprint arXiv:2011.06006, 2020 | 33 | 2020 |
3d minimal SCFTs from wrapped M5-branes JB Bae, D Gang, J Lee Journal of High Energy Physics 2017 (8), 118, 2017 | 31 | 2017 |
GLSMs for non-Kähler geometries A Adams, E Dyer, J Lee Journal of High Energy Physics 2013 (1), 1-39, 2013 | 31 | 2013 |