How much over-parameterization is sufficient to learn deep ReLU networks? Z Chen, Y Cao, D Zou, Q Gu arXiv preprint arXiv:1911.12360, 2019 | 133 | 2019 |
Self-play fine-tuning converts weak language models to strong language models Z Chen, Y Deng, H Yuan, K Ji, Q Gu arXiv preprint arXiv:2401.01335, 2024 | 99 | 2024 |
A generalized neural tangent kernel analysis for two-layer neural networks Z Chen, Y Cao, Q Gu, T Zhang Advances in Neural Information Processing Systems 33, 13363-13373, 2020 | 94* | 2020 |
Benign overfitting in two-layer convolutional neural networks Y Cao, Z Chen, M Belkin, Q Gu Advances in neural information processing systems 35, 25237-25250, 2022 | 89 | 2022 |
Towards understanding the mixture-of-experts layer in deep learning Z Chen, Y Deng, Y Wu, Q Gu, Y Li Advances in neural information processing systems 35, 23049-23062, 2022 | 85* | 2022 |
Almost optimal algorithms for two-player zero-sum linear mixture markov games Z Chen, D Zhou, Q Gu International Conference on Algorithmic Learning Theory, 227-261, 2022 | 59* | 2022 |
Stein neural sampler T Hu, Z Chen, H Sun, J Bai, M Ye, G Cheng arXiv preprint arXiv:1810.03545, 2018 | 49 | 2018 |
A general framework for sample-efficient function approximation in reinforcement learning Z Chen, CJ Li, A Yuan, Q Gu, MI Jordan arXiv preprint arXiv:2209.15634, 2022 | 35 | 2022 |
How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression? J Wu, D Zou, Z Chen, V Braverman, Q Gu, PL Bartlett arXiv preprint arXiv:2310.08391, 2023 | 32 | 2023 |
Rephrase and respond: Let large language models ask better questions for themselves Y Deng, W Zhang, Z Chen, Q Gu arXiv preprint arXiv:2311.04205, 2023 | 29 | 2023 |
Benign overfitting in two-layer ReLU convolutional neural networks Y Kou, Z Chen, Y Chen, Q Gu International Conference on Machine Learning, 17615-17659, 2023 | 23 | 2023 |
Self-training converts weak learners to strong learners in mixture models S Frei, D Zou, Z Chen, Q Gu International Conference on Artificial Intelligence and Statistics, 8003-8021, 2022 | 18 | 2022 |
Why does sharpness-aware minimization generalize better than SGD? Z Chen, J Zhang, Y Kou, X Chen, CJ Hsieh, Q Gu Advances in neural information processing systems 36, 2024 | 10 | 2024 |
Implicit bias of gradient descent for two-layer reLU and leaky reLU networks on nearly-orthogonal data Y Kou, Z Chen, Q Gu Advances in Neural Information Processing Systems 36, 2024 | 9 | 2024 |
Learning high-dimensional single-neuron relu networks with finite samples J Wu, D Zou, Z Chen, V Braverman, Q Gu, SM Kakade arXiv preprint arXiv:2303.02255, 2023 | 6* | 2023 |
How Does Semi-supervised learing with Pseudo-labelers Work? A Case Study Y Kou, Z Chen, Y Cao, Q Gu International Conference on Learning Representations, 2023 | 6 | 2023 |
Understanding transferable representation learning and zero-shot transfer in clip Z Chen, Y Deng, Y Li, Q Gu arXiv preprint arXiv:2310.00927, 2023 | 5 | 2023 |
Faster perturbed stochastic gradient methods for finding local minima Z Chen, D Zhou, Q Gu International Conference on Algorithmic Learning Theory, 176-204, 2022 | 4 | 2022 |
Understanding train-validation split in meta-learning with neural networks X Zuo, Z Chen, H Yao, Y Cao, Q Gu International Conference on Learning Representations (ICLR), 2023 | 3 | 2023 |
Self-play fine-tuning of diffusion models for text-to-image generation H Yuan, Z Chen, K Ji, Q Gu arXiv preprint arXiv:2402.10210, 2024 | 2 | 2024 |