Language varieties of Italy: Technology challenges and opportunities
A Ramponi - Transactions of the Association for Computational …, 2024 - direct.mit.edu
Italy is characterized by a one-of-a-kind linguistic diversity landscape in Europe, which
implicitly encodes local knowledge, cultural traditions, artistic expressions, and history of its …
implicitly encodes local knowledge, cultural traditions, artistic expressions, and history of its …
Language variety identification with true labels
Language identification is an important first step in many IR and NLP applications. Most
publicly available language identification datasets, however, are compiled under the …
publicly available language identification datasets, however, are compiled under the …
VarDial evaluation campaign 2024: Commonsense reasoning in dialects and multi-label similar language identification
This report presents the results of the shared tasks organized as part of the VarDial
Evaluation Campaign 2024. The campaign is part of the eleventh workshop on Natural …
Evaluation Campaign 2024. The campaign is part of the eleventh workshop on Natural …
DiatopIt: A corpus of social media posts for the study of diatopic language variation in Italy
We introduce DiatopIt, the first corpus specifically focused on diatopic language variation in
Italy for language varieties other than Standard Italian. DiatopIt comprises over 15K …
Italy for language varieties other than Standard Italian. DiatopIt comprises over 15K …
[PDF][PDF] Italian language and dialect identification and regional French variety detection using adaptive naive Bayes
T Jauhiainen, H Jauhiainen… - International …, 2022 - researchportal.helsinki.fi
These proceedings include the 13 papers presented at the Ninth Workshop on NLP for
Similar Languages, Varieties and Dialects (VarDial), co-located with the 29th International …
Similar Languages, Varieties and Dialects (VarDial), co-located with the 29th International …
DADA: Dialect adaptation via dynamic aggregation of linguistic rules
Existing large language models (LLMs) that mainly focus on Standard American English
(SAE) often lead to significantly worse performance when being applied to other English …
(SAE) often lead to significantly worse performance when being applied to other English …
What do dialect speakers want? a survey of attitudes towards language technology for german dialects
Natural language processing (NLP) has largely focused on modelling standardized
languages. More recently, attention has increasingly shifted to local, non-standardized …
languages. More recently, attention has increasingly shifted to local, non-standardized …
Fine-tuning bert with character-level noise for zero-shot transfer to dialects and closely-related languages
A Srivastava, D Chiang - arXiv preprint arXiv:2303.17683, 2023 - arxiv.org
In this work, we induce character-level noise in various forms when fine-tuning BERT to
enable zero-shot cross-lingual transfer to unseen dialects and languages. We fine-tune …
enable zero-shot cross-lingual transfer to unseen dialects and languages. We fine-tune …
[PDF][PDF] Optimizing naive Bayes for Arabic dialect identification
T Jauhiainen, H Jauhiainen… - Arabic Natural …, 2022 - researchportal.helsinki.fi
This article describes the language identification system used by the SUKI team in the 2022
Nuanced Arabic Dialect Identification (NADI) shared task. In addition to the system …
Nuanced Arabic Dialect Identification (NADI) shared task. In addition to the system …
Dialect and variant identification as a multi-label classification task: A proposal based on near-duplicate analysis
G Bernier-Colborne, C Goutte… - Tenth Workshop on NLP …, 2023 - aclanthology.org
We argue that dialect identification should be treated as a multi-label classification problem
rather than the single-class setting prevalent in existing collections and evaluations. In order …
rather than the single-class setting prevalent in existing collections and evaluations. In order …