Leaving the barn door open for Clever Hans: Simple features predict LLM benchmark answers
The integrity of AI benchmarks is fundamental to accurately assess the capabilities of AI
systems. The internal validity of these benchmarks-ie, making sure they are free from …
systems. The internal validity of these benchmarks-ie, making sure they are free from …
LLMs Are Prone to Fallacies in Causal Inference
Recent work shows that causal facts can be effectively extracted from LLMs through
prompting, facilitating the creation of causal graphs for causal inference tasks. However, it is …
prompting, facilitating the creation of causal graphs for causal inference tasks. However, it is …
[PDF][PDF] On the Limitations of Zero-Shot Classification of Causal Relations by LLMs (Work in Progress)
V Kanjirangat, A Antonucci, M Zaalon - Proceedings http://ceur-ws …, 2024 - people.idsia.ch
We aim to explore and analyze the capabilities and limitations of the large language models
in understanding and distinguishing causal sentences under a zero-shot setting. We …
in understanding and distinguishing causal sentences under a zero-shot setting. We …
[PDF][PDF] Developing Benchmark for Causal Representation Learning in LLMs: An Informal Write-Up 2
C Guo - chengguo2000.github.io
After conducting a primitive literature review on causality and LLMs, I believe that further
research should focus beyond inferring explicit causal relationships, but rather on the …
research should focus beyond inferring explicit causal relationships, but rather on the …
[PDF][PDF] My Idea on Developing a New Benchmark for Causal Inference in LLMs: An Informal Write-Up
C Guo - chengguo2000.github.io
My name is Cheng Guo and I am a first-year Master's student studying Computer Science at
the University of California, San Diego. I am dedicated to Causality and LLM research and …
the University of California, San Diego. I am dedicated to Causality and LLM research and …
[PDF][PDF] My Idea on Developing a New Benchmark for Causal Inference in LLMs
C Guo - chengguo2000.github.io
My Idea on Developing a New Benchmark for Causal Inference in LLMs Page 1 My Idea on
Developing a New Benchmark for Causal Inference in LLMs Cheng Guo 1 Page 2 Overview • Who …
Developing a New Benchmark for Causal Inference in LLMs Cheng Guo 1 Page 2 Overview • Who …