A survey on evaluation of large language models Y Chang, X Wang, J Wang, Y Wu, L Yang, K Zhu, H Chen, X Yi, C Wang, ... ACM Transactions on Intelligent Systems and Technology, 2023 | 716 | 2023 |
Promptbench: Towards evaluating the robustness of large language models on adversarial prompts K Zhu, J Wang, J Zhou, Z Wang, H Chen, Y Wang, L Yang, W Ye, Y Zhang, ... arXiv preprint arXiv:2306.04528, 2023 | 123 | 2023 |
The good, the bad, and why: Unveiling emotions in generative ai C Li, J Wang, Y Zhang, K Zhu, X Wang, W Hou, J Lian, F Luo, Q Yang, ... International Conference on Machine Learning (ICML), 2023 | 67* | 2023 |
CompeteAI: Understanding the Competition Dynamics of Large Language Model-based Agents Q Zhao, J Wang, Y Zhang, Y Jin, K Zhu, H Chen, X Xie International Conference on Machine Learning (ICML), 0 | 14* | |
Dyval: Graph-informed dynamic evaluation of large language models K Zhu, J Chen, J Wang, NZ Gong, D Yang, X Xie International Conference on Learning Representations (ICLR), 2023 | 10 | 2023 |
Promptbench: A unified library for evaluation of large language models K Zhu, Q Zhao, H Chen, J Wang, X Xie arXiv preprint arXiv:2312.07910, 2023 | 9 | 2023 |
Improving generalization of adversarial training via robust critical fine-tuning K Zhu, X Hu, J Wang, X Xie, G Yang International Conference on Computer Vision (ICCV), 2023 | 9 | 2023 |
DyVal 2: Dynamic Evaluation of Large Language Models by Meta Probing Agents K Zhu, J Wang, Q Zhao, R Xu, X Xie International Conference on Machine Learning (ICML), 2024 | 4 | 2024 |
NPHardEval4V: A Dynamic Reasoning Benchmark of Multimodal Large Language Models L Fan, W Hua, X Li, K Zhu, M Jin, L Li, H Ling, J Chi, J Wang, X Ma, ... arXiv preprint arXiv:2403.01777, 2024 | 2 | 2024 |
Disentangling Logic: The Role of Context in Large Language Model Reasoning Capabilities W Hua, K Zhu, L Li, L Fan, S Lin, M Jin, H Xue, Z Li, JD Wang, Y Zhang arXiv preprint arXiv:2406.02787, 2024 | | 2024 |