OdiEnCorp 2.0: Odia-English parallel corpus for machine translation

G Ramesh, S Doddapaneni, A Bheemaraj… - Transactions of the …, 2022 - direct.mit.edu

We present Samanantar, the largest publicly available parallel corpora collection for Indic
languages. The collection contains a total of 49.7 million sentence pairs between English …

被引用次数：108 相关文章所有 11 个版本

[PDF] aclanthology.org

Overview of the 8th workshop on Asian translation

T Nakazawa, H Nakayama, C Ding… - Proceedings of the …, 2021 - aclanthology.org

This paper presents the results of the shared tasks from the 8th workshop on Asian
translation (WAT2021). For the WAT2021, 28 teams participated in the shared tasks and 24 …

被引用次数：167 相关文章所有 15 个版本

[PDF] arxiv.org

Part-of-speech tagging of Odia language using statistical and deep learning based approaches

T Dalai, TK Mishra, PK Sa - ACM Transactions on Asian and Low …, 2023 - dl.acm.org

Automatic part-of-speech (POS) tagging is a preprocessing step of many natural language
processing tasks, such as named entity recognition, speech processing, information …

被引用次数：12 相关文章所有 4 个版本

[PDF] aclanthology.org

The LTRC hindi-telugu parallel corpus

V Mujadia, DM Sharma - Proceedings of the Thirteenth Language …, 2022 - aclanthology.org

Abstract We present the Hindi-Telugu Parallel Corpus of different technical domains such as
Natural Science, Computer Science, Law and Healthcare along with the General domain …

被引用次数：7 相关文章所有 2 个版本

[PDF] arxiv.org

Building a llama2-finetuned llm for odia language utilizing domain knowledge instruction set

GS Kohli, S Parida, S Sekhar, S Saha, NB Nair… - Proceedings of the …, 2023 - dl.acm.org

Building LLMs for languages other than English is in great demand due to the unavailability
and performance of multilingual LLMs, such as understanding the local context. The …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Improving Access to Justice for the Indian Population: A Benchmark for Evaluating Translation of Legal Text to Indian Languages

S Mahapatra, D Datta, S Soni, A Goswami… - arXiv preprint arXiv …, 2023 - arxiv.org

Most legal text in the Indian judiciary is written in complex English due to historical reasons.
However, only about 10% of the Indian population is comfortable in reading English. Hence …

被引用次数：2 相关文章所有 2 个版本

[PDF] aclanthology.org

Language technologies for low resource languages: Sociolinguistic and multilingual insights

AS Doğruöz, S Sitaram - Proceedings of the 1st Annual Meeting of …, 2022 - aclanthology.org

There is a growing interest in building language technologies (LTs) for low resource
languages (LRLs). However, there are flaws in the planning, data collection and …

被引用次数：5 相关文章所有 5 个版本

Addressing the data gap: building a parallel corpus for Kashmiri language

SMU Qumar, M Azim, SMK Quadri - International Journal of Information …, 2024 - Springer

This paper marks a significant step forward in language technology for low-resource
languages by developing the first parallel corpus for the Kashmiri language, which …

[PDF] arxiv.org

Exploring pair-wise NMT for Indian languages

K Akella, SH Allu, SS Ragupathi, A Singhal… - arXiv preprint arXiv …, 2020 - arxiv.org

In this paper, we address the task of improving pair-wise machine translation for specific low
resource Indian languages. Multilingual NMT models have demonstrated a reasonable …

被引用次数：7 相关文章所有 5 个版本

[PDF] arxiv.org

Universal Dependency Treebank for Odia Language

S Parida, K Sahoo, AK Ojha, S Sahoo, SR Dash… - arXiv preprint arXiv …, 2022 - arxiv.org

This paper presents the first publicly available treebank of Odia, a morphologically rich low
resource Indian language. The treebank contains approx. 1082 tokens (100 sentences) in …

被引用次数：1 相关文章所有 7 个版本