Don't Decay the Learning Rate, Increase the Batch Size SL Smith, PJ Kindermans, C Ying, QV Le International Conference on Learning Representations, 2018 | 1226 | 2018 |
Ultrafast long-range charge separation in organic semiconductor photovoltaic diodes S Gélinas, A Rao, A Kumar, SL Smith, AW Chin, J Clark, TS van der Poll, ... Science 343 (6170), 512-516, 2014 | 1017 | 2014 |
Offline bilingual word vectors, orthogonal transformations and the inverted softmax SL Smith, DHP Turban, S Hamblin, NY Hammerla International Conference on Learning Representations, 2017 | 612 | 2017 |
High-performance large-scale image recognition without normalization A Brock, S De, SL Smith, K Simonyan International Conference on Machine Learning, 1059-1071, 2021 | 530 | 2021 |
A Bayesian Perspective on Generalization and Stochastic Gradient Descent SL Smith, QV Le International Conference on Learning Representations, 2018 | 411 | 2018 |
Gemma: Open Models Based on Gemini Research and Technology G Team, T Mesnard, C Hardin, R Dadashi, S Bhupatiraju, S Pathak, ... arXiv preprint arXiv:2403.08295, 2024 | 254 | 2024 |
The future of quantum biology A Marais, B Adams, AK Ringsmuth, M Ferretti, JM Gruber, R Hendrikx, ... Journal of the Royal Society Interface 15 (148), 20180640, 2018 | 230 | 2018 |
On the Origin of Implicit Regularization in Stochastic Gradient Descent SL Smith, B Dherin, DGT Barrett, S De arXiv preprint arXiv:2101.12176, 2021 | 192 | 2021 |
Batch normalization biases residual blocks towards the identity function in deep networks S De, S Smith Advances in Neural Information Processing Systems 33, 19964-19975, 2020 | 161* | 2020 |
Unlocking high-accuracy differentially private image classification through scale S De, L Berrada, J Hayes, SL Smith, B Balle arXiv preprint arXiv:2204.13650, 2022 | 159 | 2022 |
Resurrecting recurrent neural networks for long sequences A Orvieto, SL Smith, A Gu, A Fernando, C Gulcehre, R Pascanu, S De International Conference on Machine Learning, 26670-26698, 2023 | 151 | 2023 |
Characterizing signal propagation to close the performance gap in unnormalized ResNets A Brock, S De, SL Smith arXiv preprint arXiv:2101.08692, 2021 | 125 | 2021 |
On the Generalization Benefit of Noise in Stochastic Gradient Descent S Smith, E Elsen, S De International Conference on Machine Learning, 9058-9067, 2020 | 98 | 2020 |
BYOL works even without batch statistics PH Richemond, JB Grill, F Altché, C Tallec, F Strub, A Brock, S Smith, ... arXiv preprint arXiv:2010.10241, 2020 | 94 | 2020 |
The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study DS Park, J Sohl-Dickstein, QV Le, SL Smith International Conference on Machine Learning, 2019 | 54 | 2019 |
Differentially Private Diffusion Models Generate Useful Synthetic Images S Ghalebikesabi, L Berrada, S Gowal, I Ktena, R Stanforth, J Hayes, S De, ... arXiv preprint arXiv:2302.13861, 2023 | 45 | 2023 |
Phonon-assisted ultrafast charge separation in the PCBM band structure SL Smith, AW Chin Physical Review B 91 (20), 201302, 2015 | 41 | 2015 |
Ultrafast charge separation and nongeminate electron–hole recombination in organic photovoltaics SL Smith, AW Chin Physical Chemistry Chemical Physics 16 (38), 20305-20309, 2014 | 41 | 2014 |
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models S De, SL Smith, A Fernando, A Botev, G Cristian-Muraru, A Gu, R Haroun, ... arXiv preprint arXiv:2402.19427, 2024 | 33 | 2024 |
Drawing multiple augmentation samples per image during training efficiently decreases test error S Fort, A Brock, R Pascanu, S De, SL Smith arXiv preprint arXiv:2105.13343, 2021 | 28 | 2021 |