相关文章- 学术资源搜索

DIALECTBENCH: A NLP Benchmark for Dialects, Varieties, and Closely-Related Languages

F Faisal, O Ahia, A Srivastava, K Ahuja… - arXiv preprint arXiv …, 2024 - arxiv.org

Language technologies should be judged on their usefulness in real-world use cases. An
often overlooked aspect in natural language processing (NLP) research and evaluation is …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

Natural language processing for dialects of a language: A survey

A Joshi, R Dabre, D Kanojia, Z Li, H Zhan… - arXiv preprint arXiv …, 2024 - arxiv.org

State-of-the-art natural language processing (NLP) models are trained on massive training
corpora, and report a superlative performance on evaluation datasets. This survey delves …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

Quantifying the dialect gap and its correlates across languages

A Kantharuban, I Vulić, A Korhonen - arXiv preprint arXiv:2310.15135, 2023 - arxiv.org

Historically, researchers and consumers have noticed a decrease in quality when applying
NLP tools to minority variants of languages (ie Puerto Rican Spanish or Swiss German), but …

被引用次数：8 相关文章所有 4 个版本

[PDF] arxiv.org

GlobalBench: A benchmark for global progress in natural language processing

Y Song, C Cui, S Khanuja, P Liu, F Faisal… - arXiv preprint arXiv …, 2023 - arxiv.org

Despite the major advances in NLP, significant disparities in NLP system performance
across languages still exist. Arguably, these are due to uneven resource allocation and sub …

被引用次数：8 相关文章所有 4 个版本

[PDF] arxiv.org

VALUE: Understanding dialect disparity in NLU

C Ziems, J Chen, C Harris, J Anderson… - arXiv preprint arXiv …, 2022 - arxiv.org

English Natural Language Understanding (NLU) systems have achieved great
performances and even outperformed humans on benchmarks like GLUE and SuperGLUE …

被引用次数：30 相关文章所有 7 个版本

[PDF] arxiv.org

What to do about non-standard (or non-canonical) language in NLP

B Plank - arXiv preprint arXiv:1608.07836, 2016 - arxiv.org

Real world data differs radically from the benchmark corpora we use in natural language
processing (NLP). As soon as we apply our technologies to the real world, performance …

被引用次数：107 相关文章所有 9 个版本

[PDF] arxiv.org

Datasets: A community library for natural language processing

Q Lhoest, AV Del Moral, Y Jernite, A Thakur… - arXiv preprint arXiv …, 2021 - arxiv.org

The scale, variety, and quantity of publicly-available NLP datasets has grown rapidly as
researchers propose new tasks, larger models, and novel benchmarks. Datasets is a …

被引用次数：211 相关文章所有 3 个版本

[PDF] arxiv.org

Multi-VALUE: A framework for cross-dialectal English NLP

C Ziems, W Held, J Yang, J Dhamala, R Gupta… - arXiv preprint arXiv …, 2022 - arxiv.org

Dialect differences caused by regional, social, and economic factors cause performance
discrepancies for many groups of language technology users. Inclusive and equitable …

被引用次数：22 相关文章所有 6 个版本

[PDF] arxiv.org

Glot500: Scaling multilingual corpora and language models to 500 languages

A Imani, P Lin, AH Kargaran, S Severini… - arXiv preprint arXiv …, 2023 - arxiv.org

The NLP community has mainly focused on scaling Large Language Models (LLMs)
vertically, ie, making them better for about 100 languages. We instead scale LLMs …

被引用次数：39 相关文章所有 10 个版本

[PDF] arxiv.org

Tada: Task-agnostic dialect adapters for english

W Held, C Ziems, D Yang - arXiv preprint arXiv:2305.16651, 2023 - arxiv.org

Large Language Models, the dominant starting point for Natural Language Processing
(NLP) applications, fail at a higher rate for speakers of English dialects other than Standard …

被引用次数：8 相关文章所有 5 个版本