Deberta: Decoding-enhanced bert with disentangled attention P He, X Liu, J Gao, W Chen ICLR 2021, 2020 | 2370 | 2020 |
On the variance of the adaptive learning rate and beyond L Liu, H Jiang, P He, W Chen, X Liu, J Gao, J Han ICLR 2019, 2019 | 2168 | 2019 |
Multi-task deep neural networks for natural language understanding X Liu, P He, W Chen, J Gao ACL 2019, 2019 | 1398 | 2019 |
Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing P He, J Gao, W Chen ICLR 2023, 2021 | 777 | 2021 |
Instruction tuning with gpt-4 B Peng, C Li, P He, M Galley, J Gao arXiv preprint arXiv:2304.03277, 2023 | 616 | 2023 |
Smart: Robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization H Jiang, P He, W Chen, X Liu, J Gao, T Zhao ACL 2020, 2019 | 460 | 2019 |
Check your facts and try again: Improving large language models with external knowledge and automated feedback B Peng, M Galley, P He, H Cheng, Y Xie, Y Hu, Q Huang, L Liden, Z Yu, ... arXiv preprint arXiv:2302.12813, 2023 | 317 | 2023 |
AdaLoRA: Adaptive budget allocation for parameter-efficient fine-tuning Q Zhang, M Chen, A Bukharin, N Karampatziakis, P He, Y Cheng, ... arXiv preprint arXiv:2303.10512, 2023 | 247 | 2023 |
Improving multi-task deep neural networks via knowledge distillation for natural language understanding X Liu, P He, W Chen, J Gao arXiv preprint arXiv:1904.09482, 2019 | 210 | 2019 |
Generation-augmented retrieval for open-domain question answering Y Mao, P He, X Liu, Y Shen, J Gao, J Han, W Chen arXiv preprint arXiv:2009.08553, 2020 | 197 | 2020 |
Diffusion-GAN: Training GANs with Diffusion Z Wang, H Zheng, P He, W Chen, M Zhou ICLR 2023, 2022 | 184 | 2022 |
Adversarial training for large neural language models X Liu, H Cheng, P He, W Chen, Y Wang, H Poon, J Gao arXiv preprint arXiv:2004.08994, 2020 | 179 | 2020 |
Dola: Decoding by contrasting layers improves factuality in large language models YS Chuang, Y Xie, H Luo, Y Kim, J Glass, P He arXiv preprint arXiv:2309.03883, 2023 | 124 | 2023 |
X-SQL: reinforce schema representation with context P He, Y Mao, K Chakrabarti, W Chen arXiv preprint arXiv:1908.08113, 2019 | 106 | 2019 |
Patch diffusion: Faster and more data-efficient training of diffusion models Z Wang, Y Jiang, H Zheng, P Wang, P He, Z Wang, W Chen, M Zhou Advances in neural information processing systems 36, 2024 | 102 | 2024 |
On the variance of the adaptive learning rate and beyond. arXiv 2019 L Liu, H Jiang, P He, W Chen, X Liu, J Gao, J Han arXiv preprint arXiv:1908.03265, 2019 | 92 | 2019 |
Query rewriting for retrieval-augmented large language models X Ma, Y Gong, P He, H Zhao, N Duan arXiv preprint arXiv:2305.14283, 2023 | 82 | 2023 |
NeurIPS 2020 EfficientQA competition: Systems, analyses and lessons learned S Min, J Boyd-Graber, C Alberti, D Chen, E Choi, M Collins, K Guu, ... NeurIPS 2020, 2021 | 71 | 2021 |
Loftq: Lora-fine-tuning-aware quantization for large language models Y Li, Y Yu, C Liang, P He, N Karampatziakis, W Chen, T Zhao arXiv preprint arXiv:2310.08659, 2023 | 70 | 2023 |
Platon: Pruning large transformer models with upper confidence bound of weight importance Q Zhang, S Zuo, C Liang, A Bukharin, P He, W Chen, T Zhao International Conference on Machine Learning, 26809-26823, 2022 | 66 | 2022 |