Scaling speech technology to 1,000+ languages

V Pratap, A Tjandra, B Shi, P Tomasello, A Babu… - Journal of Machine …, 2024 - jmlr.org
Expanding the language coverage of speech technology has the potential to improve
access to information for many more people. However, current speech technology is …

No language left behind: Scaling human-centered machine translation

MR Costa-jussà, J Cross, O Çelebi, M Elbayad… - arXiv preprint arXiv …, 2022 - arxiv.org
Driven by the goal of eradicating language barriers on a global scale, machine translation
has solidified itself as a key focus of artificial intelligence research today. However, such …

[HTML][HTML] Scaling neural machine translation to 200 languages

NLLB Team - Nature, 2024 - pmc.ncbi.nlm.nih.gov
The development of neural techniques has opened up new avenues for research in
machine translation. Today, neural machine translation (NMT) systems can leverage highly …

Findings of the AmericasNLP 2021 shared task on open machine translation for indigenous languages of the Americas

M Mager, A Oncevay, A Ebrahimi, J Ortega… - Proceedings of the …, 2021 - aclanthology.org
This paper presents the results of the 2021 Shared Task on Open Machine Translation for
Indigenous Languages of the Americas. The shared task featured two independent tracks …

Expanding pretrained models to thousands more languages via lexicon-based adaptation

X Wang, S Ruder, G Neubig - arXiv preprint arXiv:2203.09435, 2022 - arxiv.org
The performance of multilingual pretrained models is highly dependent on the availability of
monolingual or parallel text present in a target language. Thus, the majority of the world's …

A few thousand translations go a long way! leveraging pre-trained models for african news translation

DI Adelani, JO Alabi, A Fan, J Kreutzer, X Shen… - arXiv preprint arXiv …, 2022 - arxiv.org
Recent advances in the pre-training of language models leverage large-scale datasets to
create multilingual models. However, low-resource languages are mostly left out in these …

How to adapt your pretrained multilingual model to 1600 languages

A Ebrahimi, K Kann - arXiv preprint arXiv:2106.02124, 2021 - arxiv.org
Pretrained multilingual models (PMMs) enable zero-shot learning via cross-lingual transfer,
performing best for languages seen during pretraining. While methods exist to improve …

Ethical considerations for machine translation of indigenous languages: Giving a voice to the speakers

M Mager, E Mager, K Kann, NT Vu - arXiv preprint arXiv:2305.19474, 2023 - arxiv.org
In recent years machine translation has become very successful for high-resource language
pairs. This has also sparked new interest in research on the automatic translation of low …

Jwsign: A highly multilingual corpus of bible translations for more diversity in sign language processing

S Gueuwou, S Siake, C Leong, M Müller - arXiv preprint arXiv:2311.10174, 2023 - arxiv.org
Advancements in sign language processing have been hindered by a lack of sufficient data,
impeding progress in recognition, translation, and production tasks. The absence of …

Pre-trained multilingual sequence-to-sequence models: A hope for low-resource language translation?

ESA Lee, S Thillainathan, S Nayak… - arXiv preprint arXiv …, 2022 - arxiv.org
What can pre-trained multilingual sequence-to-sequence models like mBART contribute to
translating low-resource languages? We conduct a thorough empirical experiment in 10 …