Baichuan 2: Open large-scale language models A Yang, B Xiao, B Wang, B Zhang, C Bian, C Yin, C Lv, D Pan, D Wang, ... arXiv preprint arXiv:2309.10305, 2023 | 231 | 2023 |
Beavertails: Towards improved safety alignment of llm via a human-preference dataset J Ji, M Liu, J Dai, X Pan, C Zhang, C Bian, B Chen, R Sun, Y Wang, ... Advances in Neural Information Processing Systems 36, 2024 | 141 | 2024 |
Safe rlhf: Safe reinforcement learning from human feedback J Dai*, X Pan*, R Sun*, J Ji*, X Xu, M Liu, Y Wang, Y Yang arXiv preprint arXiv:2310.12773, 2023 | 94 | 2023 |
Safety gymnasium: A unified safe reinforcement learning benchmark J Ji, B Zhang, J Zhou, X Pan, W Huang, R Sun, Y Geng, Y Zhong, J Dai, ... Advances in Neural Information Processing Systems 36, 2023 | 27 | 2023 |
OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research J Ji, J Zhou, B Zhang, J Dai, X Pan, R Sun, W Huang, Y Geng, M Liu, ... arXiv preprint arXiv:2305.09304, 2023 | 24 | 2023 |
Pku-beaver: Constrained value-aligned llm via safe rlhf J Dai, X Pan, J Ji, R Sun, Y Wang, Y Yang | 12 | 2023 |