Large language model alignment: A survey T Shen, R Jin, Y Huang, C Liu, W Dong, Z Guo, X Wu, Y Liu, D Xiong arXiv preprint arXiv:2309.15025, 2023 | 58 | 2023 |
Unbiased learning to rank in feeds recommendation X Wu, H Chen, J Zhao, L He, D Yin, Y Chang Proceedings of the 14th ACM International Conference on Web Search and Data …, 2021 | 36 | 2021 |
Depn: Detecting and editing privacy neurons in pretrained language models X Wu, J Li, M Xu, W Dong, S Wu, C Bian, D Xiong arXiv preprint arXiv:2310.20138, 2023 | 29 | 2023 |
Adaptive differential privacy for language model training X Wu, L Gong, D Xiong Proceedings of the First Workshop on Federated Learning for Natural Language …, 2022 | 6 | 2022 |
Fewfedweight: Few-shot federated learning framework across multiple nlp tasks W Dong, X Wu, J Li, S Wu, C Bian, D Xiong arXiv preprint arXiv:2212.08354, 2022 | 5 | 2022 |
Exploring Multilingual Human Value Concepts in Large Language Models: Is Value Alignment Consistent, Transferable and Controllable across Languages? S Xu, W Dong, Z Guo, X Wu, D Xiong arXiv preprint arXiv:2402.18120, 2024 | 3 | 2024 |
Swing distillation: A privacy-preserving knowledge distillation framework J Li, X Wu, W Dong, S Wu, C Bian, D Xiong arXiv preprint arXiv:2212.08349, 2022 | 3 | 2022 |
ConTrans: Weak-to-Strong Alignment Engineering via Concept Transplantation W Dong, X Wu, R Jin, S Xu, D Xiong arXiv preprint arXiv:2405.13578, 2024 | 1 | 2024 |
IRCAN: Mitigating Knowledge Conflicts in LLM Generation via Identifying and Reweighting Context-Aware Neurons D Shi, R Jin, T Shen, W Dong, X Wu, D Xiong arXiv preprint arXiv:2406.18406, 2024 | | 2024 |