Safety assessment of chinese large language models H Sun, Z Zhang, J Deng, J Cheng, M Huang arXiv preprint arXiv:2304.10436, 2023 | 67 | 2023 |
Safetybench: Evaluating the safety of large language models with multiple choice questions Z Zhang, L Lei, L Wu, R Sun, Y Huang, C Long, X Liu, X Lei, J Tang, ... ACL 2024, 2023 | 51 | 2023 |
OpenMEVA: A Benchmark for Evaluating Open-ended Story Generation Metrics J Guan, Z Zhang, Z Feng, Z Liu, W Ding, X Mao, C Fan, M Huang ACL 2021, 2021 | 41 | 2021 |
Recent advances towards safe, responsible, and moral dialogue systems: A survey J Deng, H Sun, Z Zhang, J Cheng, M Huang arXiv preprint arXiv:2302.09270 1, 2023 | 27 | 2023 |
Unveiling the implicit toxicity in large language models J Wen, P Ke, H Sun, Z Zhang, C Li, J Bai, M Huang EMNLP 2023, 2023 | 26 | 2023 |
Defending large language models against jailbreaking attacks through goal prioritization Z Zhang, J Yang, P Ke, M Huang ACL 2024, 2023 | 26 | 2023 |
Persona-Guided Planning for Controlling the Protagonist's Persona in Story Generation Z Zhang, J Wen, J Guan, M Huang NAACL 2022, 2022 | 17 | 2022 |
MoralDial: A framework to train and evaluate moral dialogue systems via moral discussions H Sun, Z Zhang, F Mi, Y Wang, W Liu, J Cui, B Wang, Q Liu, M Huang ACL 2023, 2022 | 11 | 2022 |
Ethicist: Targeted training data extraction through loss smoothed soft prompting and calibrated confidence estimation Z Zhang, J Wen, M Huang ACL 2023, 2023 | 10 | 2023 |
Constructing Highly Inductive Contexts for Dialogue Safety through Controllable Reverse Generation Z Zhang, J Cheng, H Sun, J Deng, F Mi, Y Wang, L Shang, M Huang EMNLP 2022 Findings, 2022 | 8 | 2022 |
Automatic comment generation for Chinese student narrative essays Z Zhang, J Guan, G Xu, Y Tian, M Huang Proceedings of the 2022 Conference on Empirical Methods in Natural Language …, 2022 | 4 | 2022 |
Shieldlm: Empowering llms as aligned, customizable and explainable safety detectors Z Zhang, Y Lu, J Ma, D Zhang, R Li, P Ke, H Sun, L Sha, Z Sui, H Wang, ... arXiv preprint arXiv:2402.16444, 2024 | 2 | 2024 |
InstructSafety: A Unified Framework for Building Multidimensional and Explainable Safety Detector through Instruction Tuning Z Zhang, J Cheng, H Sun, J Deng, M Huang Findings of the Association for Computational Linguistics: EMNLP 2023, 10421 …, 2023 | 2 | 2023 |
Enhancing Offensive Language Detection with Data Augmentation and Knowledge Distillation J Deng, Z Chen, H Sun, Z Zhang, J Wu, S Nakagawa, F Ren, M Huang Research 6, 0189, 2023 | 1 | 2023 |
Selecting Stickers in Open-Domain Dialogue through Multitask Learning Z Zhang, Y Zhu, Z Fei, J Zhang, J Zhou ACL 2022 Findings, 2022 | 1 | 2022 |
Moraldial: A framework to train and evaluate moral dialogue systems via constructing moral discussions H Sun, Z Zhang, F Mi, Y Wang, W Liu, J Cui, B Wang, Q Liu, M Huang arXiv preprint arXiv:2212.10720, 2022 | 1 | 2022 |
Self-Supervised Sentence Polishing by Adding Engaging Modifiers Z Zhang, J Guan, X Cui, Y Ran, B Liu, M Huang Proceedings of the 61st Annual Meeting of the Association for Computational …, 2023 | | 2023 |