Safety assessment of chinese large language models H Sun, Z Zhang, J Deng, J Cheng, M Huang arXiv preprint arXiv:2304.10436, 2023 | 61 | 2023 |
On the safety of conversational models: Taxonomy, dataset, and benchmark H Sun, G Xu, J Deng, J Cheng, C Zheng, H Zhou, N Peng, X Zhu, ... arXiv preprint arXiv:2110.08466, 2021 | 55 | 2021 |
COLD: A benchmark for Chinese offensive language detection J Deng, J Zhou, H Sun, C Zheng, F Mi, H Meng, M Huang arXiv preprint arXiv:2201.06025, 2022 | 54 | 2022 |
Psyqa: A chinese dataset for generating long counseling text for mental health support H Sun, Z Lin, C Zheng, S Liu, M Huang arXiv preprint arXiv:2106.01702, 2021 | 49 | 2021 |
Eva: An open-domain chinese dialogue system with large-scale generative pre-training H Zhou, P Ke, Z Zhang, Y Gu, Y Zheng, C Zheng, Y Wang, CH Wu, H Sun, ... arXiv preprint arXiv:2108.01547, 2021 | 45 | 2021 |
Eva2. 0: Investigating open-domain chinese dialogue systems with large-scale pre-training Y Gu, J Wen, H Sun, Y Song, P Ke, C Zheng, Z Zhang, J Yao, L Liu, X Zhu, ... Machine Intelligence Research 20 (2), 207-219, 2023 | 37 | 2023 |
Recent advances towards safe, responsible, and moral dialogue systems: A survey J Deng, H Sun, Z Zhang, J Cheng, M Huang arXiv preprint arXiv:2302.09270 1, 2023 | 26 | 2023 |
Unveiling the implicit toxicity in large language models J Wen, P Ke, H Sun, Z Zhang, C Li, J Bai, M Huang arXiv preprint arXiv:2311.17391, 2023 | 17 | 2023 |
Pal: Persona-augmented emotional support conversation generation J Cheng, S Sabour, H Sun, Z Chen, M Huang arXiv preprint arXiv:2212.09235, 2022 | 14 | 2022 |
MoralDial: A framework to train and evaluate moral dialogue systems via moral discussions H Sun, Z Zhang, F Mi, Y Wang, W Liu, J Cui, B Wang, Q Liu, M Huang arXiv preprint arXiv:2212.10720, 2022 | 9 | 2022 |
Constructing highly inductive contexts for dialogue safety through controllable reverse generation Z Zhang, J Cheng, H Sun, J Deng, F Mi, Y Wang, L Shang, M Huang arXiv preprint arXiv:2212.01810, 2022 | 7 | 2022 |
ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors Z Zhang, Y Lu, J Ma, D Zhang, R Li, P Ke, H Sun, L Sha, Z Sui, H Wang, ... arXiv preprint arXiv:2402.16444, 2024 | 2 | 2024 |
InstructSafety: A Unified Framework for Building Multidimensional and Explainable Safety Detector through Instruction Tuning Z Zhang, J Cheng, H Sun, J Deng, M Huang Findings of the Association for Computational Linguistics: EMNLP 2023, 10421 …, 2023 | 2 | 2023 |
Enhancing Offensive Language Detection with Data Augmentation and Knowledge Distillation J Deng, Z Chen, H Sun, Z Zhang, J Wu, S Nakagawa, F Ren, M Huang Research 6, 0189, 2023 | 1 | 2023 |
Moraldial: A framework to train and evaluate moral dialogue systems via constructing moral discussions H Sun, Z Zhang, F Mi, Y Wang, W Liu, J Cui, B Wang, Q Liu, M Huang arXiv preprint arXiv:2212.10720, 2022 | 1 | 2022 |