Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks S Arora, S Du, W Hu, Z Li, R Wang International Conference on Machine Learning, 322-332, 2019 | 1002 | 2019 |
On exact computation with an infinitely wide neural net S Arora, SS Du, W Hu, Z Li, RR Salakhutdinov, R Wang Advances in neural information processing systems 32, 2019 | 948 | 2019 |
Implicit regularization in deep matrix factorization S Arora, N Cohen, W Hu, Y Luo Advances in Neural Information Processing Systems 32, 2019 | 521 | 2019 |
A convergence analysis of gradient descent for deep linear neural networks S Arora, N Cohen, N Golowich, W Hu arXiv preprint arXiv:1810.02281, 2018 | 261 | 2018 |
Few-shot learning via learning the representation, provably SS Du, W Hu, SM Kakade, JD Lee, Q Lei arXiv preprint arXiv:2002.09434, 2020 | 256 | 2020 |
Algorithmic regularization in learning deep homogeneous models: Layers are automatically balanced SS Du, W Hu, JD Lee Advances in neural information processing systems 31, 2018 | 214 | 2018 |
An analysis of the t-sne algorithm for data visualization S Arora, W Hu, PK Kothari Conference on learning theory, 1455-1462, 2018 | 173 | 2018 |
Combinatorial multi-armed bandit with general reward functions W Chen, W Hu, F Li, J Li, Y Liu, P Lu Advances in Neural Information Processing Systems 29, 2016 | 143 | 2016 |
Provable benefit of orthogonal initialization in optimizing deep linear networks W Hu, L Xiao, J Pennington arXiv preprint arXiv:2001.05992, 2020 | 129 | 2020 |
Enhanced convolutional neural tangent kernels Z Li, R Wang, D Yu, SS Du, W Hu, R Salakhutdinov, S Arora arXiv preprint arXiv:1911.00809, 2019 | 127 | 2019 |
Linear convergence of the primal-dual gradient method for convex-concave saddle point problems without strong convexity SS Du, W Hu The 22nd International Conference on Artificial Intelligence and Statistics …, 2019 | 126 | 2019 |
Width provably matters in optimization for deep linear neural networks S Du, W Hu International Conference on Machine Learning, 1655-1664, 2019 | 93 | 2019 |
Explaining landscape connectivity of low-cost solutions for multilayer nets R Kuditipudi, X Wang, H Lee, Y Zhang, Z Li, W Hu, R Ge, S Arora Advances in neural information processing systems 32, 2019 | 87 | 2019 |
Simple and effective regularization methods for training on noisily labeled data with generalization guarantee W Hu, Z Li, D Yu arXiv preprint arXiv:1905.11368, 2019 | 82 | 2019 |
The surprising simplicity of the early-time learning dynamics of neural networks W Hu, L Xiao, B Adlam, J Pennington Advances in Neural Information Processing Systems 33, 17116-17128, 2020 | 71 | 2020 |
Linear convergence of a frank-wolfe type algorithm over trace-norm balls Z Allen-Zhu, E Hazan, W Hu, Y Li Advances in neural information processing systems 30, 2017 | 63 | 2017 |
Impact of representation learning in linear bandits J Yang, W Hu, JD Lee, SS Du International Conference on Learning Representations, 2021 | 58* | 2021 |
More than a toy: Random matrix models predict how real-world neural representations generalize A Wei, W Hu, J Steinhardt International Conference on Machine Learning, 23549-23588, 2022 | 55 | 2022 |
New characterizations in turnstile streams with applications Y Ai, W Hu, Y Li, DP Woodruff 31st Conference on Computational Complexity (CCC 2016), 2016 | 45 | 2016 |
Near-optimal linear regression under distribution shift Q Lei, W Hu, J Lee International Conference on Machine Learning, 6164-6174, 2021 | 40 | 2021 |