Post-hoc interpretability for neural nlp: A survey

A Madsen, S Reddy, S Chandar - ACM Computing Surveys, 2022 - dl.acm.org
Neural networks for NLP are becoming increasingly complex and widespread, and there is a
growing concern if these models are responsible to use. Explaining models helps to address …

Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text

S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org
Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …

Interactive and visual prompt engineering for ad-hoc task adaptation with large language models

H Strobelt, A Webson, V Sanh, B Hoover… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
State-of-the-art neural language models can now be used to solve ad-hoc language tasks
through zero-shot prompting without the need for supervised training. This approach has …

Promptaid: Prompt exploration, perturbation, testing and iteration using visual analytics for large language models

A Mishra, U Soni, A Arunkumar, J Huang… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) have gained widespread popularity due to their ability to
perform ad-hoc Natural Language Processing (NLP) tasks with a simple natural language …

Explaining machine learning models with interactive natural language conversations using TalkToModel

D Slack, S Krishna, H Lakkaraju, S Singh - Nature Machine Intelligence, 2023 - nature.com
Practitioners increasingly use machine learning (ML) models, yet models have become
more complex and harder to understand. To understand complex models, researchers have …

Using natural language processing to support peer‐feedback in the age of artificial intelligence: a cross‐disciplinary framework and a research agenda

E Bauer, M Greisel, I Kuznetsov… - British Journal of …, 2023 - Wiley Online Library
Advancements in artificial intelligence are rapidly increasing. The new‐generation large
language models, such as ChatGPT and GPT‐4, bear the potential to transform educational …

Robustness gym: Unifying the NLP evaluation landscape

K Goel, N Rajani, J Vig, S Tan, J Wu, S Zheng… - arXiv preprint arXiv …, 2021 - arxiv.org
Despite impressive performance on standard benchmarks, deep neural networks are often
brittle when deployed in real-world systems. Consequently, recent research has focused on …

Discovering the syntax and strategies of natural language programming with generative language models

E Jiang, E Toh, A Molina, K Olson, C Kayacik… - Proceedings of the …, 2022 - dl.acm.org
In this paper, we present a natural language code synthesis tool, GenLine, backed by 1) a
large generative language model and 2) a set of task-specific prompts that create or change …

Bias and unfairness in machine learning models: a systematic review on datasets, tools, fairness metrics, and identification and mitigation methods

TP Pagano, RB Loureiro, FVN Lisboa… - Big data and cognitive …, 2023 - mdpi.com
One of the difficulties of artificial intelligence is to ensure that model decisions are fair and
free of bias. In research, datasets, metrics, techniques, and tools are applied to detect and …

Understanding the effect of out-of-distribution examples and interactive explanations on human-ai decision making

H Liu, V Lai, C Tan - Proceedings of the ACM on Human-Computer …, 2021 - dl.acm.org
Although AI holds promise for improving human decision making in societally critical
domains, it remains an open question how human-AI teams can reliably outperform AI alone …