The Johns Hopkins University Bible corpus: 1600+ tongues for typological exploration

V Pratap, A Tjandra, B Shi, P Tomasello, A Babu… - Journal of Machine …, 2024 - jmlr.org

Expanding the language coverage of speech technology has the potential to improve
access to information for many more people. However, current speech technology is …

被引用次数：225 相关文章所有 3 个版本

[PDF] arxiv.org

No language left behind: Scaling human-centered machine translation

MR Costa-jussà, J Cross, O Çelebi, M Elbayad… - arXiv preprint arXiv …, 2022 - arxiv.org

Driven by the goal of eradicating language barriers on a global scale, machine translation
has solidified itself as a key focus of artificial intelligence research today. However, such …

被引用次数：670 相关文章所有 2 个版本

[HTML] nih.gov

[HTML][HTML] Scaling neural machine translation to 200 languages

NLLB Team - Nature, 2024 - pmc.ncbi.nlm.nih.gov

The development of neural techniques has opened up new avenues for research in
machine translation. Today, neural machine translation (NMT) systems can leverage highly …

被引用次数：12 相关文章

[PDF] aclanthology.org

Findings of the AmericasNLP 2021 shared task on open machine translation for indigenous languages of the Americas

M Mager, A Oncevay, A Ebrahimi, J Ortega… - Proceedings of the …, 2021 - aclanthology.org

This paper presents the results of the 2021 Shared Task on Open Machine Translation for
Indigenous Languages of the Americas. The shared task featured two independent tracks …

被引用次数：81 相关文章所有 8 个版本

[PDF] arxiv.org

Expanding pretrained models to thousands more languages via lexicon-based adaptation

X Wang, S Ruder, G Neubig - arXiv preprint arXiv:2203.09435, 2022 - arxiv.org

The performance of multilingual pretrained models is highly dependent on the availability of
monolingual or parallel text present in a target language. Thus, the majority of the world's …

被引用次数：57 相关文章所有 9 个版本

[PDF] arxiv.org

A few thousand translations go a long way! leveraging pre-trained models for african news translation

DI Adelani, JO Alabi, A Fan, J Kreutzer, X Shen… - arXiv preprint arXiv …, 2022 - arxiv.org

Recent advances in the pre-training of language models leverage large-scale datasets to
create multilingual models. However, low-resource languages are mostly left out in these …

被引用次数：38 相关文章所有 11 个版本

[PDF] arxiv.org

How to adapt your pretrained multilingual model to 1600 languages

A Ebrahimi, K Kann - arXiv preprint arXiv:2106.02124, 2021 - arxiv.org

Pretrained multilingual models (PMMs) enable zero-shot learning via cross-lingual transfer,
performing best for languages seen during pretraining. While methods exist to improve …

被引用次数：57 相关文章所有 4 个版本

[PDF] arxiv.org

Ethical considerations for machine translation of indigenous languages: Giving a voice to the speakers

M Mager, E Mager, K Kann, NT Vu - arXiv preprint arXiv:2305.19474, 2023 - arxiv.org

In recent years machine translation has become very successful for high-resource language
pairs. This has also sparked new interest in research on the automatic translation of low …

被引用次数：22 相关文章所有 5 个版本

[PDF] arxiv.org

Jwsign: A highly multilingual corpus of bible translations for more diversity in sign language processing

S Gueuwou, S Siake, C Leong, M Müller - arXiv preprint arXiv:2311.10174, 2023 - arxiv.org

Advancements in sign language processing have been hindered by a lack of sufficient data,
impeding progress in recognition, translation, and production tasks. The absence of …

被引用次数：7 相关文章所有 5 个版本

[PDF] arxiv.org

Pre-trained multilingual sequence-to-sequence models: A hope for low-resource language translation?

ESA Lee, S Thillainathan, S Nayak… - arXiv preprint arXiv …, 2022 - arxiv.org

What can pre-trained multilingual sequence-to-sequence models like mBART contribute to
translating low-resource languages? We conduct a thorough empirical experiment in 10 …

被引用次数：28 相关文章所有 6 个版本