The rise and potential of large language model based agents: A survey Z Xi, W Chen, X Guo, W He, Y Ding, B Hong, M Zhang, J Wang, S Jin, ... arXiv preprint arXiv:2309.07864, 2023 | 458 | 2023 |
Secrets of rlhf in large language models part ii: Reward modeling B Wang, R Zheng, L Chen, Y Liu, S Dou, C Huang, W Shen, S Jin, E Zhou, ... arXiv preprint arXiv:2401.06080, 2024 | 32 | 2024 |
LoRAMoE: Alleviating World Knowledge Forgetting in Large Language Models via MoE-Style Plugin S Dou, E Zhou, Y Liu, S Gao, W Shen, L Xiong, Y Zhou, X Wang, Z Xi, ... Proceedings of the 62nd Annual Meeting of the Association for Computational …, 2024 | 17* | 2024 |
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback S Dou, Y Liu, H Jia, L Xiong, E Zhou, J Shan, C Huang, W Shen, X Fan, ... arXiv preprint arXiv:2402.01391, 2024 | 5 | 2024 |
MetaRM: Shifted Distributions Alignment via Meta-Learning S Dou, Y Liu, E Zhou, T Li, H Jia, L Xiong, X Zhao, J Ye, R Zheng, T Gui, ... arXiv preprint arXiv:2405.00438, 2024 | 1 | 2024 |
RealBehavior: A Framework for Faithfully Characterizing Foundation Models' Human-like Behavior Mechanisms E Zhou, R Zheng, Z Xi, S Gao, X Fan, Z Fei, J Ye, T Gui, Q Zhang, ... arXiv preprint arXiv:2310.11227, 2023 | | 2023 |