Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages
G Ramesh, S Doddapaneni, A Bheemaraj… - Transactions of the …, 2022 - direct.mit.edu
We present Samanantar, the largest publicly available parallel corpora collection for Indic
languages. The collection contains a total of 49.7 million sentence pairs between English …
languages. The collection contains a total of 49.7 million sentence pairs between English …
Overview of the 8th workshop on Asian translation
T Nakazawa, H Nakayama, C Ding… - Proceedings of the …, 2021 - aclanthology.org
This paper presents the results of the shared tasks from the 8th workshop on Asian
translation (WAT2021). For the WAT2021, 28 teams participated in the shared tasks and 24 …
translation (WAT2021). For the WAT2021, 28 teams participated in the shared tasks and 24 …
Part-of-speech tagging of Odia language using statistical and deep learning based approaches
Automatic part-of-speech (POS) tagging is a preprocessing step of many natural language
processing tasks, such as named entity recognition, speech processing, information …
processing tasks, such as named entity recognition, speech processing, information …
The LTRC hindi-telugu parallel corpus
V Mujadia, DM Sharma - Proceedings of the Thirteenth Language …, 2022 - aclanthology.org
Abstract We present the Hindi-Telugu Parallel Corpus of different technical domains such as
Natural Science, Computer Science, Law and Healthcare along with the General domain …
Natural Science, Computer Science, Law and Healthcare along with the General domain …
Building a llama2-finetuned llm for odia language utilizing domain knowledge instruction set
Building LLMs for languages other than English is in great demand due to the unavailability
and performance of multilingual LLMs, such as understanding the local context. The …
and performance of multilingual LLMs, such as understanding the local context. The …
Improving Access to Justice for the Indian Population: A Benchmark for Evaluating Translation of Legal Text to Indian Languages
Most legal text in the Indian judiciary is written in complex English due to historical reasons.
However, only about 10% of the Indian population is comfortable in reading English. Hence …
However, only about 10% of the Indian population is comfortable in reading English. Hence …
Language technologies for low resource languages: Sociolinguistic and multilingual insights
AS Doğruöz, S Sitaram - Proceedings of the 1st Annual Meeting of …, 2022 - aclanthology.org
There is a growing interest in building language technologies (LTs) for low resource
languages (LRLs). However, there are flaws in the planning, data collection and …
languages (LRLs). However, there are flaws in the planning, data collection and …
Addressing the data gap: building a parallel corpus for Kashmiri language
This paper marks a significant step forward in language technology for low-resource
languages by developing the first parallel corpus for the Kashmiri language, which …
languages by developing the first parallel corpus for the Kashmiri language, which …
Exploring pair-wise NMT for Indian languages
In this paper, we address the task of improving pair-wise machine translation for specific low
resource Indian languages. Multilingual NMT models have demonstrated a reasonable …
resource Indian languages. Multilingual NMT models have demonstrated a reasonable …
Universal Dependency Treebank for Odia Language
This paper presents the first publicly available treebank of Odia, a morphologically rich low
resource Indian language. The treebank contains approx. 1082 tokens (100 sentences) in …
resource Indian language. The treebank contains approx. 1082 tokens (100 sentences) in …