Baichuan 2: Open large-scale language models A Yang, B Xiao, B Wang, B Zhang, C Bian, C Yin, C Lv, D Pan, D Wang, ... arXiv preprint arXiv:2309.10305, 2023 | 238* | 2023 |
Beavertails: Towards improved safety alignment of llm via a human-preference dataset J Ji, M Liu, J Dai, X Pan, C Zhang, C Bian, B Chen, R Sun, Y Wang, ... Advances in Neural Information Processing Systems 36, 2024 | 104 | 2024 |
Ai alignment: A comprehensive survey J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang, Y Duan, Z He, J Zhou, ... arXiv preprint arXiv:2310.19852, 2023 | 88 | 2023 |
Safe rlhf: Safe reinforcement learning from human feedback J Dai, X Pan, R Sun, J Ji, X Xu, M Liu, Y Wang, Y Yang arXiv preprint arXiv:2310.12773, 2023 | 60 | 2023 |
Omnisafe: An infrastructure for accelerating safe reinforcement learning research J Ji, J Zhou, B Zhang, J Dai, X Pan, R Sun, W Huang, Y Geng, M Liu, ... arXiv preprint arXiv:2305.09304, 2023 | 19 | 2023 |
Mate: Benchmarking multi-agent reinforcement learning in distributed target coverage control X Pan, M Liu, F Zhong, Y Yang, SC Zhu, Y Wang Advances in Neural Information Processing Systems 35, 27862-27879, 2022 | 19 | 2022 |
Safety gymnasium: A unified safe reinforcement learning benchmark J Ji, B Zhang, J Zhou, X Pan, W Huang, R Sun, Y Geng, Y Zhong, J Dai, ... Advances in Neural Information Processing Systems 36, 2023 | 14 | 2023 |
Safety-gymnasium J Ji, B Zhang, X Pan, J Zhou, J Dai, Y Yang GitHub repository, 2023 | 14 | 2023 |
Aligner: Achieving efficient alignment through weak-to-strong correction J Ji, B Chen, H Lou, D Hong, B Zhang, X Pan, J Dai, Y Yang arXiv preprint arXiv:2402.02416, 2024 | 13 | 2024 |
Pku-beaver: Constrained value-aligned llm via safe rlhf J Dai, X Pan, J Ji, R Sun, Y Wang, Y Yang | 10 | 2023 |
Proactive multi-camera collaboration for 3d human pose estimation H Ci, M Liu, X Pan, F Zhong, Y Wang arXiv preprint arXiv:2303.03767, 2023 | 7 | 2023 |
Torchopt: An efficient library for differentiable optimization J Ren, X Feng, B Liu, X Pan, Y Fu, L Mai, Y Yang Journal of Machine Learning Research 24 (367), 1-14, 2023 | 7 | 2023 |
Red teaming game: A game-theoretic framework for red teaming language models C Ma, Z Yang, M Gao, H Ci, J Gao, X Pan, Y Yang arXiv preprint arXiv:2310.00322, 2023 | 6 | 2023 |
Rethinking Information Structures in RLHF: Reward Generalization from a Graph Theory Perspective T Qiu, F Zeng, J Ji, D Yan, K Wang, J Zhou, H Yang, J Dai, X Pan, Y Yang arXiv preprint arXiv:2402.10184, 2024 | 4 | 2024 |