Polyjuice: Generating counterfactuals for explaining, evaluating, and improving models

T Wu, MT Ribeiro, J Heer, DS Weld - arXiv preprint arXiv:2101.00288, 2021 - arxiv.org
While counterfactual examples are useful for analysis and training of NLP models, current
generation methods either rely on manual labor to create very few counterfactuals, or only …

Patat: Human-ai collaborative qualitative coding with explainable interactive rule synthesis

SA Gebreegziabher, Z Zhang, X Tang, Y Meng… - Proceedings of the …, 2023 - dl.acm.org
Over the years, the task of AI-assisted data annotation has seen remarkable advancements.
However, a specific type of annotation task, the qualitative coding performed during thematic …

MultiCoNER: A large-scale multilingual dataset for complex named entity recognition

S Malmasi, A Fang, B Fetahu, S Kar… - arXiv preprint arXiv …, 2022 - arxiv.org
We present MultiCoNER, a large multilingual dataset for Named Entity Recognition that
covers 3 domains (Wiki sentences, questions, and search queries) across 11 languages, as …

GEMNET: Effective gated gazetteer representations for recognizing complex entities in low-context input

T Meng, A Fang, O Rokhlenko… - Proceedings of the 2021 …, 2021 - aclanthology.org
Abstract Named Entity Recognition (NER) remains difficult in real-world settings; current
challenges include short texts (low context), emerging entities, and complex entities (eg …

Scattershot: Interactive in-context example curation for text transformation

S Wu, H Shen, DS Weld, J Heer… - Proceedings of the 28th …, 2023 - dl.acm.org
The in-context learning capabilities of LLMs like GPT-3 allow annotators to customize an
LLM to their specific tasks with a small number of examples. However, users tend to include …

Supporting Sensemaking of Large Language Model Outputs at Scale

KI Gero, C Swoopes, Z Gu, JK Kummerfeld… - Proceedings of the CHI …, 2024 - dl.acm.org
Large language models (LLMs) are capable of generating multiple responses to a single
prompt, yet little effort has been expended to help end-users or system designers make use …

Intuitively assessing ml model reliability through example-based explanations and editing model inputs

H Suresh, KM Lewis, J Guttag… - Proceedings of the 27th …, 2022 - dl.acm.org
Interpretability methods aim to help users build trust in and understand the capabilities of
machine learning models. However, existing approaches often rely on abstract, complex …

ShortcutLens: A visual analytics approach for exploring shortcuts in natural language understanding dataset

Z Jin, X Wang, F Cheng, C Sun, Q Liu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Benchmark datasets play an important role in evaluating Natural Language Understanding
(NLU) models. However, shortcuts—unwanted biases in the benchmark datasets—can …

Jailbreakhunter: a visual analytics approach for jailbreak prompts discovery from large-scale human-llm conversational datasets

Z Jin, S Liu, H Li, X Zhao, H Qu - arXiv preprint arXiv:2407.03045, 2024 - arxiv.org
Large Language Models (LLMs) have gained significant attention but also raised concerns
due to the risk of misuse. Jailbreak prompts, a popular type of adversarial attack towards …

Exploring Empty Spaces: Human-in-the-Loop Data Augmentation

C Yeh, D Ren, Y Assogba, D Moritz… - arXiv preprint arXiv …, 2024 - arxiv.org
Data augmentation is crucial to make machine learning models more robust and safe.
However, augmenting data can be challenging as it requires generating diverse data points …