Post-hoc interpretability for neural nlp: A survey
Neural networks for NLP are becoming increasingly complex and widespread, and there is a
growing concern if these models are responsible to use. Explaining models helps to address …
growing concern if these models are responsible to use. Explaining models helps to address …
Qa dataset explosion: A taxonomy of nlp resources for question answering and reading comprehension
Alongside huge volumes of research on deep learning models in NLP in the recent years,
there has been much work on benchmark datasets needed to track modeling progress …
there has been much work on benchmark datasets needed to track modeling progress …
Holistic evaluation of language models
Language models (LMs) are becoming the foundation for almost all major language
technologies, but their capabilities, limitations, and risks are not well understood. We present …
technologies, but their capabilities, limitations, and risks are not well understood. We present …
Training-free structured diffusion guidance for compositional text-to-image synthesis
Large-scale diffusion models have achieved state-of-the-art results on text-to-image
synthesis (T2I) tasks. Despite their ability to generate high-quality yet creative images, we …
synthesis (T2I) tasks. Despite their ability to generate high-quality yet creative images, we …
Are NLP models really able to solve simple math word problems?
The problem of designing NLP solvers for math word problems (MWP) has seen sustained
research activity and steady gains in the test accuracy. Since existing solvers achieve high …
research activity and steady gains in the test accuracy. Since existing solvers achieve high …
Dynabench: Rethinking benchmarking in NLP
We introduce Dynabench, an open-source platform for dynamic dataset creation and model
benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the …
benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the …
Towards a unified multi-dimensional evaluator for text generation
Multi-dimensional evaluation is the dominant paradigm for human evaluation in Natural
Language Generation (NLG), ie, evaluating the generated text from multiple explainable …
Language Generation (NLG), ie, evaluating the generated text from multiple explainable …
Causal inference in natural language processing: Estimation, prediction, interpretation and beyond
A fundamental goal of scientific research is to learn about causal relationships. However,
despite its critical role in the life and social sciences, causality has not had the same …
despite its critical role in the life and social sciences, causality has not had the same …
Reasoning or reciting? exploring the capabilities and limitations of language models through counterfactual tasks
The impressive performance of recent language models across a wide range of tasks
suggests that they possess a degree of abstract reasoning skills. Are these skills general …
suggests that they possess a degree of abstract reasoning skills. Are these skills general …