Causal reasoning of entities and events in procedural texts L Zhang, H Xu, Y Yang, S Zhou, W You, M Arora, C Callison-Burch arXiv preprint arXiv:2301.10896, 2023 | 17 | 2023 |
Human-in-the-loop schema induction T Zhang, I Tham, Z Hou, J Ren, L Zhou, H Xu, L Zhang, LJ Martin, R Dror, ... arXiv preprint arXiv:2302.13048, 2023 | 12 | 2023 |
OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models H Xu, R Zhao, L Zhu, J Du, Y He arXiv preprint arXiv:2402.06044, 2024 | 6 | 2024 |
Exploring the curious case of code prompts L Zhang, L Dugan, H Xu, C Callison-Burch arXiv preprint arXiv:2304.13250, 2023 | 6 | 2023 |
Large language models fall short: Understanding complex relationships in detective narratives R Zhao, Q Zhu, H Xu, J Li, Y Zhou, Y He, L Gui arXiv preprint arXiv:2402.11051, 2024 | 3 | 2024 |
RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors L Dugan, A Hwang, F Trhlik, JM Ludan, A Zhu, H Xu, D Ippolito, ... arXiv preprint arXiv:2405.07940, 2024 | 2 | 2024 |
Openpi2. 0: An improved dataset for entity tracking in texts L Zhang, H Xu, A Kommula, C Callison-Burch, N Tandon arXiv preprint arXiv:2305.14603, 2023 | 2 | 2023 |
Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring J Li, H Xu, Z Sun, Y Zhou, D West, C Aloisi, Y He arXiv preprint arXiv:2406.19949, 2024 | | 2024 |
Towards Unified Task Embeddings Across Multiple Models: Bridging the Gap for Prompt-Based Large Language Models and Beyond X Wang, H Xu, L Gui, Y He arXiv preprint arXiv:2402.14522, 2024 | | 2024 |