Answer, assemble, ace: Understanding how transformers answer multiple choice questions

S Wiegreffe, O Tafjord, Y Belinkov, H Hajishirzi… - arXiv preprint arXiv …, 2024 - arxiv.org
Multiple-choice question answering (MCQA) is a key competence of performant transformer
language models that is tested by mainstream benchmarks. However, recent evidence …

Benchmark Evaluations, Applications, and Challenges of Large Vision Language Models: A Survey

Z Li, X Wu, H Du, H Nghiem, G Shi - arXiv preprint arXiv:2501.02189, 2025 - arxiv.org
Multimodal Vision Language Models (VLMs) have emerged as a transformative technology
at the intersection of computer vision and natural language processing, enabling machines …

Is Your Large Language Model Knowledgeable or a Choices-Only Cheater?

N Balepur, R Rudinger - arXiv preprint arXiv:2407.01992, 2024 - arxiv.org
Recent work shows that large language models (LLMs) can answer multiple-choice
questions using only the choices, but does this mean that MCQA leaderboard rankings of …

Plausibly Problematic Questions in Multiple-Choice Benchmarks for Commonsense Reasoning

S Palta, N Balepur, P Rankel, S Wiegreffe… - arXiv preprint arXiv …, 2024 - arxiv.org
Questions involving commonsense reasoning about everyday situations often admit many
$\textit {possible} $ or $\textit {plausible} $ answers. In contrast, multiple-choice question …

Mitigating selection bias with node pruning and auxiliary options

HK Choi, W Xu, C Xue, S Eckman… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) often show unwarranted preference for certain choice
options when responding to multiple-choice questions, posing significant reliability concerns …

Knowledge is power: Open-world knowledge representation learning for knowledge-based visual reasoning

W Zheng, L Yan, FY Wang - Artificial Intelligence, 2024 - Elsevier
Abstract Knowledge-based visual reasoning requires the ability to associate outside
knowledge that is not present in a given image for cross-modal visual understanding. Two …

Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can't Answer?

N Balepur, F Gu, A Ravichander, S Feng… - arXiv preprint arXiv …, 2024 - arxiv.org
Question answering (QA)-producing correct answers for input questions-is popular, but we
test a reverse question answering (RQA) task: given an input answer, generate a question …

Shortcut Learning in In-Context Learning: A Survey

R Song, Y Li, F Giunchiglia, H Xu - arXiv preprint arXiv:2411.02018, 2024 - arxiv.org
Shortcut learning refers to the phenomenon where models employ simple, non-robust
decision rules in practical tasks, which hinders their generalization and robustness. With the …

Susu Box or Piggy Bank: Assessing Cultural Commonsense Knowledge between Ghana and the US

C Acquaye, H An, R Rudinger - arXiv preprint arXiv:2410.16451, 2024 - arxiv.org
Recent work has highlighted the culturally-contingent nature of commonsense knowledge.
We introduce AMAMMER ${\epsilon} $, a test set of 525 multiple-choice questions designed …

Listening to the Wise Few: Select-and-Copy Attention Heads for Multiple-Choice QA

E Tulchinskii, L Kushnareva, K Kuznetsov… - arXiv preprint arXiv …, 2024 - arxiv.org
A standard way to evaluate the abilities of LLM involves presenting a multiple-choice
question and selecting the option with the highest logit as the model's predicted answer …