Language model tokenizers introduce unfairness between languages

A Petrov, E La Malfa, P Torr… - Advances in Neural …, 2024 - proceedings.neurips.cc
Recent language models have shown impressive multilingual performance, even when not
explicitly trained for it. Despite this, there are concerns about the quality of their outputs …

CreoleVal: Multilingual multitask benchmarks for creoles

H Lent, K Tatariya, R Dabre, Y Chen… - Transactions of the …, 2024 - direct.mit.edu
Creoles represent an under-explored and marginalized group of languages, with few
available resources for NLP research. While the genealogical ties between Creoles and a …

What a Creole Wants, What a Creole Needs

H Lent, K Ogueji, M de Lhoneux, O Ahia… - arXiv preprint arXiv …, 2022 - arxiv.org
In recent years, the natural language processing (NLP) community has given increased
attention to the disparity of efforts directed towards high-resource languages over low …

[PDF][PDF] Low-Resource Cross-Lingual Adaptive Training for Nigerian Pidgin

PJ Lin, M Saeed, E Chang… - Proceedings of the 24th …, 2023 - isca-archive.org
Developing effective spoken language processing systems for low-resource languages
poses several challenges due to the lack of parallel data and limited resources for fine …

How to parse a creole: When martinican creole meets french

L Mompelat, D Dakota, S Kübler - Proceedings of the 29th …, 2022 - aclanthology.org
We investigate methods to develop a parser for Martinican Creole, a highly under-resourced
language, using a French treebank. We compare transfer learning and multi-task learning …

Ancestor-to-creole transfer is not a walk in the park

H Lent, E Bugliarello, A Søgaard - arXiv preprint arXiv:2206.04371, 2022 - arxiv.org
We aim to learn language models for Creole languages for which large volumes of data are
not readily available, and therefore explore the potential transfer from ancestor languages …

Whose Language? Whose DH? Towards a taxonomy of definitional elusiveness in the digital humanities

J Brown - Digital Scholarship in the Humanities, 2023 - academic.oup.com
This article responds to the current interventions regarding spatio-and linguistic diversity in
the digital humanities (DHs). Previous work has focused on the practitioners of DHs …

Guylingo: The Republic of Guyana Creole Corpora

C Clarke, R Daynauth, C Wilkinson, H Devonish… - arXiv preprint arXiv …, 2024 - arxiv.org
While major languages often enjoy substantial attention and resources, the linguistic
diversity across the globe encompasses a multitude of smaller, indigenous, and regional …

Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM Compression

Z Xu, A Gupta, T Li, O Bentham, V Srikumar - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) are increasingly deployed in real-world scenarios with the
help of recent model compression techniques. Such momentum towards local deployment …

Kreolmorisienmt: A dataset for mauritian creole machine translation

R Dabre, A Sukhoo - Findings of the Association for …, 2022 - aclanthology.org
In this paper, we describe KreolMorisienMT, a dataset for benchmarking machine translation
quality of Mauritian Creole. Mauritian Creole (Kreol Morisien) is a French-based creole and …