DIALECTBENCH: A NLP Benchmark for Dialects, Varieties, and Closely-Related Languages
Language technologies should be judged on their usefulness in real-world use cases. An
often overlooked aspect in natural language processing (NLP) research and evaluation is …
often overlooked aspect in natural language processing (NLP) research and evaluation is …
Natural language processing for dialects of a language: A survey
State-of-the-art natural language processing (NLP) models are trained on massive training
corpora, and report a superlative performance on evaluation datasets. This survey delves …
corpora, and report a superlative performance on evaluation datasets. This survey delves …
Quantifying the dialect gap and its correlates across languages
Historically, researchers and consumers have noticed a decrease in quality when applying
NLP tools to minority variants of languages (ie Puerto Rican Spanish or Swiss German), but …
NLP tools to minority variants of languages (ie Puerto Rican Spanish or Swiss German), but …
GlobalBench: A benchmark for global progress in natural language processing
Despite the major advances in NLP, significant disparities in NLP system performance
across languages still exist. Arguably, these are due to uneven resource allocation and sub …
across languages still exist. Arguably, these are due to uneven resource allocation and sub …
VALUE: Understanding dialect disparity in NLU
English Natural Language Understanding (NLU) systems have achieved great
performances and even outperformed humans on benchmarks like GLUE and SuperGLUE …
performances and even outperformed humans on benchmarks like GLUE and SuperGLUE …
What to do about non-standard (or non-canonical) language in NLP
B Plank - arXiv preprint arXiv:1608.07836, 2016 - arxiv.org
Real world data differs radically from the benchmark corpora we use in natural language
processing (NLP). As soon as we apply our technologies to the real world, performance …
processing (NLP). As soon as we apply our technologies to the real world, performance …
Datasets: A community library for natural language processing
The scale, variety, and quantity of publicly-available NLP datasets has grown rapidly as
researchers propose new tasks, larger models, and novel benchmarks. Datasets is a …
researchers propose new tasks, larger models, and novel benchmarks. Datasets is a …
Multi-VALUE: A framework for cross-dialectal English NLP
Dialect differences caused by regional, social, and economic factors cause performance
discrepancies for many groups of language technology users. Inclusive and equitable …
discrepancies for many groups of language technology users. Inclusive and equitable …
Glot500: Scaling multilingual corpora and language models to 500 languages
The NLP community has mainly focused on scaling Large Language Models (LLMs)
vertically, ie, making them better for about 100 languages. We instead scale LLMs …
vertically, ie, making them better for about 100 languages. We instead scale LLMs …
Tada: Task-agnostic dialect adapters for english
Large Language Models, the dominant starting point for Natural Language Processing
(NLP) applications, fail at a higher rate for speakers of English dialects other than Standard …
(NLP) applications, fail at a higher rate for speakers of English dialects other than Standard …