Debateqa: Evaluating question answering on debatable knowledge

R Xu, X Qi, Z Qi, W Xu, Z Guo - arXiv preprint arXiv:2408.01419, 2024 - arxiv.org
The rise of large language models (LLMs) has enabled us to seek answers to inherently
debatable questions on LLM chatbots, necessitating a reliable way to evaluate their ability …