Odiencorp: Odia–english and odia-only corpus for machine translation

H Laurençon, L Saulnier, T Wang… - Advances in …, 2022 - proceedings.neurips.cc

As language models grow ever larger, the need for large-scale high-quality text datasets has
never been more pressing, especially in multilingual settings. The BigScience workshop, a 1 …

被引用次数：133 相关文章所有 21 个版本

[PDF] aclanthology.org

Overview of the 8th workshop on Asian translation

T Nakazawa, H Nakayama, C Ding… - Proceedings of the …, 2021 - aclanthology.org

This paper presents the results of the shared tasks from the 8th workshop on Asian
translation (WAT2021). For the WAT2021, 28 teams participated in the shared tasks and 24 …

被引用次数：167 相关文章所有 15 个版本

[PDF] arxiv.org

A multilingual parallel corpora collection effort for Indian languages

S Siripragada, J Philip, VP Namboodiri… - arXiv preprint arXiv …, 2020 - arxiv.org

We present sentence aligned parallel corpora across 10 Indian Languages-Hindi, Telugu,
Tamil, Malayalam, Gujarati, Urdu, Bengali, Oriya, Marathi, Punjabi, and English-many of …

被引用次数：59 相关文章所有 11 个版本

[PDF] arxiv.org

Revisiting low resource status of indian languages in machine translation

J Philip, S Siripragada, VP Namboodiri… - Proceedings of the 3rd …, 2021 - dl.acm.org

Indian language machine translation performance is hampered due to the lack of large scale
multi-lingual sentence aligned corpora and robust benchmarks. Through this paper, we …

被引用次数：33 相关文章所有 5 个版本

[PDF] arxiv.org

Part-of-speech tagging of Odia language using statistical and deep learning based approaches

T Dalai, TK Mishra, PK Sa - ACM Transactions on Asian and Low …, 2023 - dl.acm.org

Automatic part-of-speech (POS) tagging is a preprocessing step of many natural language
processing tasks, such as named entity recognition, speech processing, information …

被引用次数：12 相关文章所有 4 个版本

[PDF] aclanthology.org

A large-scale evaluation of neural machine transliteration for Indic languages

A Kunchukuttan, S Jain, R Kejriwal - … of the 16th Conference of the …, 2021 - aclanthology.org

We take up the task of large-scale evaluation of neural machine transliteration between
English and Indic languages, with a focus on multilingual transliteration to utilize …

被引用次数：11 相关文章所有 4 个版本

[HTML] mdpi.com

[HTML][HTML] Four Million Segments and Counting: Building an English-Croatian Parallel Corpus through Crowdsourcing Using a Novel Gamification-Based Platform

R Jaworski, S Seljan, I Dunđer - Information, 2023 - mdpi.com

Parallel corpora have been widely used in the fields of natural language processing and
translation as they provide crucial multilingual information. They are used to train machine …

被引用次数：2 相关文章所有 7 个版本

[PDF] arxiv.org

Efficiently reusing old models across languages via transfer learning

T Kocmi, O Bojar - arXiv preprint arXiv:1909.10955, 2019 - arxiv.org

Recent progress in neural machine translation is directed towards larger neural networks
trained on an increasing amount of hardware resources. As a result, NMT models are costly …

被引用次数：11 相关文章所有 7 个版本

[PDF] ugent.be

Open machine translation for low resource South American languages (AmericasNLP 2021 shared task contribution)

S Parida, S Panda, A Dash… - First Workshop on …, 2021 - biblio.ugent.be

This paper describes the team (“Tamalli”)'s submission to AmericasNLP2021 shared task on
Open Machine Translation for low resource South American languages. Our goal was to …

被引用次数：7 相关文章所有 11 个版本

[PDF] arxiv.org

The reality of multi-lingual machine translation

T Kocmi, D Macháček, O Bojar - arXiv preprint arXiv:2202.12814, 2022 - arxiv.org

Our book" The Reality of Multi-Lingual Machine Translation" discusses the benefits and
perils of using more than two languages in machine translation systems. While focused on …

被引用次数：3 相关文章所有 4 个版本