Leaving the barn door open for Clever Hans: Simple features predict LLM benchmark answers

L Pacchiardi, M Tesic, LG Cheke… - arXiv preprint arXiv …, 2024 - arxiv.org
The integrity of AI benchmarks is fundamental to accurately assess the capabilities of AI
systems. The internal validity of these benchmarks-ie, making sure they are free from …

LLMs Are Prone to Fallacies in Causal Inference

N Joshi, A Saparov, Y Wang, H He - arXiv preprint arXiv:2406.12158, 2024 - arxiv.org
Recent work shows that causal facts can be effectively extracted from LLMs through
prompting, facilitating the creation of causal graphs for causal inference tasks. However, it is …

[PDF][PDF] On the Limitations of Zero-Shot Classification of Causal Relations by LLMs (Work in Progress)

V Kanjirangat, A Antonucci, M Zaalon - Proceedings http://ceur-ws …, 2024 - people.idsia.ch
We aim to explore and analyze the capabilities and limitations of the large language models
in understanding and distinguishing causal sentences under a zero-shot setting. We …

[PDF][PDF] Developing Benchmark for Causal Representation Learning in LLMs: An Informal Write-Up 2

C Guo - chengguo2000.github.io
After conducting a primitive literature review on causality and LLMs, I believe that further
research should focus beyond inferring explicit causal relationships, but rather on the …

[PDF][PDF] My Idea on Developing a New Benchmark for Causal Inference in LLMs: An Informal Write-Up

C Guo - chengguo2000.github.io
My name is Cheng Guo and I am a first-year Master's student studying Computer Science at
the University of California, San Diego. I am dedicated to Causality and LLM research and …

[PDF][PDF] My Idea on Developing a New Benchmark for Causal Inference in LLMs

C Guo - chengguo2000.github.io
My Idea on Developing a New Benchmark for Causal Inference in LLMs Page 1 My Idea on
Developing a New Benchmark for Causal Inference in LLMs Cheng Guo 1 Page 2 Overview • Who …