Deconstructing lottery tickets: Zeros, signs, and the supermask H Zhou, J Lan, R Liu, J Yosinski Advances in neural information processing systems 32, 2019 | 449 | 2019 |
Teaching algorithmic reasoning via in-context learning H Zhou, A Nova, H Larochelle, A Courville, B Neyshabur, H Sedghi arXiv preprint arXiv:2211.09066, 2022 | 85* | 2022 |
What algorithms can transformers learn? a study in length generalization H Zhou, A Bradley, E Littwin, N Razin, O Saremi, J Susskind, S Bengio, ... arXiv preprint arXiv:2310.16028, 2023 | 47 | 2023 |
Fortuitous forgetting in connectionist networks H Zhou, A Vani, H Larochelle, A Courville International Conference on Learning Representations, 2021 | 30 | 2021 |
Lca: Loss change allocation for neural network training J Lan, R Liu, H Zhou, J Yosinski Advances in Neural Information Processing Systems 32, 2019 | 25 | 2019 |
Predicting grokking long before it happens: A look into the loss landscape of models which grok P Notsawo Jr, H Zhou, M Pezeshki, I Rish, G Dumas arXiv preprint arXiv:2306.13253, 2023 | 7 | 2023 |
Vanishing gradients in reinforcement finetuning of language models N Razin, H Zhou, O Saremi, V Thilak, A Bradley, P Nakkiran, J Susskind, ... arXiv preprint arXiv:2310.20703, 2023 | 3 | 2023 |
Step-by-Step Diffusion: An Elementary Tutorial P Nakkiran, A Bradley, H Zhou, M Advani arXiv preprint arXiv:2406.08929, 2024 | | 2024 |