From anecdotal evidence to quantitative evaluation methods: A systematic review on evaluating explainable ai
The rising popularity of explainable artificial intelligence (XAI) to understand high-performing
black boxes raised the question of how to evaluate explanations of machine learning (ML) …
black boxes raised the question of how to evaluate explanations of machine learning (ML) …
Towards human-centered explainable ai: A survey of user studies for model explanations
Explainable AI (XAI) is widely viewed as a sine qua non for ever-expanding AI research. A
better understanding of the needs of XAI users, as well as human-centered evaluations of …
better understanding of the needs of XAI users, as well as human-centered evaluations of …
Challenging big-bench tasks and whether chain-of-thought can solve them
BIG-Bench (Srivastava et al., 2022) is a diverse evaluation suite that focuses on tasks
believed to be beyond the capabilities of current language models. Language models have …
believed to be beyond the capabilities of current language models. Language models have …
Chain-of-thought prompting elicits reasoning in large language models
We explore how generating a chain of thought---a series of intermediate reasoning steps---
significantly improves the ability of large language models to perform complex reasoning. In …
significantly improves the ability of large language models to perform complex reasoning. In …
Can language models learn from explanations in context?
Language Models (LMs) can perform new tasks by adapting to a few in-context examples.
For humans, explanations that connect examples to task principles can improve learning …
For humans, explanations that connect examples to task principles can improve learning …
Star: Bootstrapping reasoning with reasoning
Generating step-by-step" chain-of-thought" rationales improves language model
performance on complex reasoning tasks like mathematics or commonsense question …
performance on complex reasoning tasks like mathematics or commonsense question …
Self-evaluation guided beam search for reasoning
Breaking down a problem into intermediate steps has demonstrated impressive
performance in Large Language Model (LLM) reasoning. However, the growth of the …
performance in Large Language Model (LLM) reasoning. However, the growth of the …
Symbolic chain-of-thought distillation: Small models can also" think" step-by-step
Chain-of-thought prompting (eg," Let's think step-by-step") primes large language models to
verbalize rationalization for their predictions. While chain-of-thought can lead to dramatic …
verbalize rationalization for their predictions. While chain-of-thought can lead to dramatic …
When can models learn from explanations? a formal framework for understanding the roles of explanation data
Many methods now exist for conditioning model outputs on task instructions, retrieved
documents, and user-provided explanations and feedback. Rather than relying solely on …
documents, and user-provided explanations and feedback. Rather than relying solely on …
Improved logical reasoning of language models via differentiable symbolic programming
Pre-trained large language models (LMs) struggle to perform logical reasoning reliably
despite advances in scale and compositionality. In this work, we tackle this challenge …
despite advances in scale and compositionality. In this work, we tackle this challenge …