The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects Z Zhu, J Wu, B Yu, L Wu, J Ma International Conference on Machine Learning, 7654-7663, 2019 | 265* | 2019 |
On the Noisy Gradient Descent that Generalizes as SGD J Wu, W Hu, H Xiong, J Huan, V Braverman, Z Zhu International Conference on Machine Learning, 10367-10376, 2020 | 105 | 2020 |
Programmable packet scheduling with a single queue Z Yu, C Hu, J Wu, X Sun, V Braverman, M Chowdhury, Z Liu, X Jin Proceedings of the 2021 ACM SIGCOMM 2021 Conference, 179-193, 2021 | 79 | 2021 |
Benign overfitting of constant-stepsize SGD for linear regression D Zou, J Wu, V Braverman, Q Gu, SM Kakade Journal of Machine Learning Research 24 (326), 1-58, 2023 | 62 | 2023 |
Direction matters: On the implicit bias of stochastic gradient descent with moderate learning rate J Wu, D Zou, V Braverman, Q Gu International Conference on Learning Representations, 2021 | 37 | 2021 |
Tangent-normal adversarial regularization for semi-supervised learning B Yu, J Wu, J Ma, Z Zhu Proceedings of the IEEE Conference on Computer Vision and Pattern …, 2019 | 37 | 2019 |
Twenty years after: Hierarchical {Core-Stateless} fair queueing Z Yu, J Wu, V Braverman, I Stoica, X Jin 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2021 | 35 | 2021 |
The benefits of implicit regularization from sgd in least squares problems D Zou, J Wu, V Braverman, Q Gu, DP Foster, S Kakade Advances in neural information processing systems 34, 5456-5468, 2021 | 30 | 2021 |
How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression? J Wu, D Zou, Z Chen, V Braverman, Q Gu, PL Bartlett arXiv preprint arXiv:2310.08391, 2023 | 28 | 2023 |
Last iterate risk bounds of sgd with decaying stepsize for overparameterized linear regression J Wu, D Zou, V Braverman, Q Gu, S Kakade International Conference on Machine Learning, 24280-24314, 2022 | 24 | 2022 |
Ship compute or ship data? why not both? J You, J Wu, X Jin, M Chowdhury 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2021 | 19 | 2021 |
Lifelong learning with sketched structural regularization H Li, A Krishnan, J Wu, S Kolouri, PK Pilly, V Braverman Asian conference on machine learning, 985-1000, 2021 | 17 | 2021 |
The power and limitation of pretraining-finetuning for linear regression under covariate shift J Wu, D Zou, V Braverman, Q Gu, S Kakade Advances in Neural Information Processing Systems 35, 33041-33053, 2022 | 16 | 2022 |
Accommodating picky customers: Regret bound and exploration complexity for multi-objective reinforcement learning J Wu, V Braverman, L Yang Advances in Neural Information Processing Systems 34, 13112-13124, 2021 | 15 | 2021 |
Gap-dependent unsupervised exploration for reinforcement learning J Wu, V Braverman, L Yang International Conference on Artificial Intelligence and Statistics, 4109-4131, 2022 | 14 | 2022 |
Implicit bias of gradient descent for logistic regression at the edge of stability J Wu, V Braverman, JD Lee Advances in Neural Information Processing Systems 36, 2024 | 12 | 2024 |
Fixed design analysis of regularization-based continual learning H Li, J Wu, V Braverman Conference on Lifelong Learning Agents, 513-533, 2023 | 6 | 2023 |
Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron J Wu, D Zou, Z Chen, V Braverman, Q Gu, SM Kakade International Conference on Machine Learning, 2023 | 6* | 2023 |
Risk bounds of multi-pass sgd for least squares in the interpolation regime D Zou, J Wu, V Braverman, Q Gu, S Kakade Advances in Neural Information Processing Systems 35, 12909-12920, 2022 | 6 | 2022 |
Obtaining Adjustable Regularization for Free via Iterate Averaging J Wu, V Braverman, L Yang International Conference on Machine Learning, 10344-10354, 2020 | 5 | 2020 |