Polyjuice: Generating counterfactuals for explaining, evaluating, and improving models
While counterfactual examples are useful for analysis and training of NLP models, current
generation methods either rely on manual labor to create very few counterfactuals, or only …
generation methods either rely on manual labor to create very few counterfactuals, or only …
Patat: Human-ai collaborative qualitative coding with explainable interactive rule synthesis
Over the years, the task of AI-assisted data annotation has seen remarkable advancements.
However, a specific type of annotation task, the qualitative coding performed during thematic …
However, a specific type of annotation task, the qualitative coding performed during thematic …
MultiCoNER: A large-scale multilingual dataset for complex named entity recognition
We present MultiCoNER, a large multilingual dataset for Named Entity Recognition that
covers 3 domains (Wiki sentences, questions, and search queries) across 11 languages, as …
covers 3 domains (Wiki sentences, questions, and search queries) across 11 languages, as …
GEMNET: Effective gated gazetteer representations for recognizing complex entities in low-context input
Abstract Named Entity Recognition (NER) remains difficult in real-world settings; current
challenges include short texts (low context), emerging entities, and complex entities (eg …
challenges include short texts (low context), emerging entities, and complex entities (eg …
Scattershot: Interactive in-context example curation for text transformation
The in-context learning capabilities of LLMs like GPT-3 allow annotators to customize an
LLM to their specific tasks with a small number of examples. However, users tend to include …
LLM to their specific tasks with a small number of examples. However, users tend to include …
Supporting Sensemaking of Large Language Model Outputs at Scale
Large language models (LLMs) are capable of generating multiple responses to a single
prompt, yet little effort has been expended to help end-users or system designers make use …
prompt, yet little effort has been expended to help end-users or system designers make use …
Intuitively assessing ml model reliability through example-based explanations and editing model inputs
Interpretability methods aim to help users build trust in and understand the capabilities of
machine learning models. However, existing approaches often rely on abstract, complex …
machine learning models. However, existing approaches often rely on abstract, complex …
ShortcutLens: A visual analytics approach for exploring shortcuts in natural language understanding dataset
Benchmark datasets play an important role in evaluating Natural Language Understanding
(NLU) models. However, shortcuts—unwanted biases in the benchmark datasets—can …
(NLU) models. However, shortcuts—unwanted biases in the benchmark datasets—can …
Jailbreakhunter: a visual analytics approach for jailbreak prompts discovery from large-scale human-llm conversational datasets
Large Language Models (LLMs) have gained significant attention but also raised concerns
due to the risk of misuse. Jailbreak prompts, a popular type of adversarial attack towards …
due to the risk of misuse. Jailbreak prompts, a popular type of adversarial attack towards …
Exploring Empty Spaces: Human-in-the-Loop Data Augmentation
Data augmentation is crucial to make machine learning models more robust and safe.
However, augmenting data can be challenging as it requires generating diverse data points …
However, augmenting data can be challenging as it requires generating diverse data points …