Language model tokenizers introduce unfairness between languages
Recent language models have shown impressive multilingual performance, even when not
explicitly trained for it. Despite this, there are concerns about the quality of their outputs …
explicitly trained for it. Despite this, there are concerns about the quality of their outputs …
CreoleVal: Multilingual multitask benchmarks for creoles
Creoles represent an under-explored and marginalized group of languages, with few
available resources for NLP research. While the genealogical ties between Creoles and a …
available resources for NLP research. While the genealogical ties between Creoles and a …
What a Creole Wants, What a Creole Needs
In recent years, the natural language processing (NLP) community has given increased
attention to the disparity of efforts directed towards high-resource languages over low …
attention to the disparity of efforts directed towards high-resource languages over low …
[PDF][PDF] Low-Resource Cross-Lingual Adaptive Training for Nigerian Pidgin
Developing effective spoken language processing systems for low-resource languages
poses several challenges due to the lack of parallel data and limited resources for fine …
poses several challenges due to the lack of parallel data and limited resources for fine …
How to parse a creole: When martinican creole meets french
We investigate methods to develop a parser for Martinican Creole, a highly under-resourced
language, using a French treebank. We compare transfer learning and multi-task learning …
language, using a French treebank. We compare transfer learning and multi-task learning …
Ancestor-to-creole transfer is not a walk in the park
We aim to learn language models for Creole languages for which large volumes of data are
not readily available, and therefore explore the potential transfer from ancestor languages …
not readily available, and therefore explore the potential transfer from ancestor languages …
Whose Language? Whose DH? Towards a taxonomy of definitional elusiveness in the digital humanities
J Brown - Digital Scholarship in the Humanities, 2023 - academic.oup.com
This article responds to the current interventions regarding spatio-and linguistic diversity in
the digital humanities (DHs). Previous work has focused on the practitioners of DHs …
the digital humanities (DHs). Previous work has focused on the practitioners of DHs …
Guylingo: The Republic of Guyana Creole Corpora
C Clarke, R Daynauth, C Wilkinson, H Devonish… - arXiv preprint arXiv …, 2024 - arxiv.org
While major languages often enjoy substantial attention and resources, the linguistic
diversity across the globe encompasses a multitude of smaller, indigenous, and regional …
diversity across the globe encompasses a multitude of smaller, indigenous, and regional …
Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM Compression
Large language models (LLMs) are increasingly deployed in real-world scenarios with the
help of recent model compression techniques. Such momentum towards local deployment …
help of recent model compression techniques. Such momentum towards local deployment …
Kreolmorisienmt: A dataset for mauritian creole machine translation
R Dabre, A Sukhoo - Findings of the Association for …, 2022 - aclanthology.org
In this paper, we describe KreolMorisienMT, a dataset for benchmarking machine translation
quality of Mauritian Creole. Mauritian Creole (Kreol Morisien) is a French-based creole and …
quality of Mauritian Creole. Mauritian Creole (Kreol Morisien) is a French-based creole and …