关注
Zhexin Zhang
Zhexin Zhang
在 mails.tsinghua.edu.cn 的电子邮件经过验证
标题
引用次数
引用次数
年份
Safety assessment of chinese large language models
H Sun, Z Zhang, J Deng, J Cheng, M Huang
arXiv preprint arXiv:2304.10436, 2023
672023
Safetybench: Evaluating the safety of large language models with multiple choice questions
Z Zhang, L Lei, L Wu, R Sun, Y Huang, C Long, X Liu, X Lei, J Tang, ...
ACL 2024, 2023
512023
OpenMEVA: A Benchmark for Evaluating Open-ended Story Generation Metrics
J Guan, Z Zhang, Z Feng, Z Liu, W Ding, X Mao, C Fan, M Huang
ACL 2021, 2021
412021
Recent advances towards safe, responsible, and moral dialogue systems: A survey
J Deng, H Sun, Z Zhang, J Cheng, M Huang
arXiv preprint arXiv:2302.09270 1, 2023
272023
Unveiling the implicit toxicity in large language models
J Wen, P Ke, H Sun, Z Zhang, C Li, J Bai, M Huang
EMNLP 2023, 2023
262023
Defending large language models against jailbreaking attacks through goal prioritization
Z Zhang, J Yang, P Ke, M Huang
ACL 2024, 2023
262023
Persona-Guided Planning for Controlling the Protagonist's Persona in Story Generation
Z Zhang, J Wen, J Guan, M Huang
NAACL 2022, 2022
172022
MoralDial: A framework to train and evaluate moral dialogue systems via moral discussions
H Sun, Z Zhang, F Mi, Y Wang, W Liu, J Cui, B Wang, Q Liu, M Huang
ACL 2023, 2022
112022
Ethicist: Targeted training data extraction through loss smoothed soft prompting and calibrated confidence estimation
Z Zhang, J Wen, M Huang
ACL 2023, 2023
102023
Constructing Highly Inductive Contexts for Dialogue Safety through Controllable Reverse Generation
Z Zhang, J Cheng, H Sun, J Deng, F Mi, Y Wang, L Shang, M Huang
EMNLP 2022 Findings, 2022
82022
Automatic comment generation for Chinese student narrative essays
Z Zhang, J Guan, G Xu, Y Tian, M Huang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language …, 2022
42022
Shieldlm: Empowering llms as aligned, customizable and explainable safety detectors
Z Zhang, Y Lu, J Ma, D Zhang, R Li, P Ke, H Sun, L Sha, Z Sui, H Wang, ...
arXiv preprint arXiv:2402.16444, 2024
22024
InstructSafety: A Unified Framework for Building Multidimensional and Explainable Safety Detector through Instruction Tuning
Z Zhang, J Cheng, H Sun, J Deng, M Huang
Findings of the Association for Computational Linguistics: EMNLP 2023, 10421 …, 2023
22023
Enhancing Offensive Language Detection with Data Augmentation and Knowledge Distillation
J Deng, Z Chen, H Sun, Z Zhang, J Wu, S Nakagawa, F Ren, M Huang
Research 6, 0189, 2023
12023
Selecting Stickers in Open-Domain Dialogue through Multitask Learning
Z Zhang, Y Zhu, Z Fei, J Zhang, J Zhou
ACL 2022 Findings, 2022
12022
Moraldial: A framework to train and evaluate moral dialogue systems via constructing moral discussions
H Sun, Z Zhang, F Mi, Y Wang, W Liu, J Cui, B Wang, Q Liu, M Huang
arXiv preprint arXiv:2212.10720, 2022
12022
Self-Supervised Sentence Polishing by Adding Engaging Modifiers
Z Zhang, J Guan, X Cui, Y Ran, B Liu, M Huang
Proceedings of the 61st Annual Meeting of the Association for Computational …, 2023
2023
系统目前无法执行此操作,请稍后再试。
文章 1–17