[PDF][PDF] Grapheme-to-phoneme models for (almost) any language
Abstract Grapheme-to-phoneme (g2p) models are rarely available in low-resource
languages, as the creation of training and evaluation data is expensive and time-consuming …
languages, as the creation of training and evaluation data is expensive and time-consuming …
Aksharantar: Open Indic-language transliteration datasets and models for the next billion users
Transliteration is very important in the Indian language context due to the usage of multiple
scripts and the widespread use of romanized inputs. However, few training and evaluation …
scripts and the widespread use of romanized inputs. However, few training and evaluation …
Design challenges in named entity transliteration
We analyze some of the fundamental design challenges that impact the development of a
multilingual state-of-the-art named entity transliteration system, including curating bilingual …
multilingual state-of-the-art named entity transliteration system, including curating bilingual …
[PDF][PDF] Cross-language entity linking
There has been substantial recent interest in aligning mentions of named entities in
unstructured texts to knowledge base descriptors, a task commonly called entity linking. This …
unstructured texts to knowledge base descriptors, a task commonly called entity linking. This …
[PDF][PDF] Crisis MT: Developing a cookbook for MT in crisis situations
In this paper, we propose that MT is an important technology in crisis events, something that
can and should be an integral part of a rapid-response infrastructure. By integrating MT …
can and should be an integral part of a rapid-response infrastructure. By integrating MT …
A comprehensive analysis of bilingual lexicon induction
A Irvine, C Callison-Burch - Computational Linguistics, 2017 - direct.mit.edu
Bilingual lexicon induction is the task of inducing word translations from monolingual
corpora in two languages. In this article we present the most comprehensive analysis of …
corpora in two languages. In this article we present the most comprehensive analysis of …
[PDF][PDF] Supervised bilingual lexicon induction with multiple monolingual signals
A Irvine, C Callison-Burch - … of the 2013 Conference of the North …, 2013 - aclanthology.org
Prior research into learning translations from source and target language monolingual texts
has treated the task as an unsupervised learning problem. Although many techniques take …
has treated the task as an unsupervised learning problem. Although many techniques take …
End-to-end statistical machine translation with zero or small parallel texts
A Irvine, C Callison-Burch - Natural Language Engineering, 2016 - cambridge.org
We use bilingual lexicon induction techniques, which learn translations from monolingual
texts in two languages, to build an end-to-end statistical machine translation (SMT) system …
texts in two languages, to build an end-to-end statistical machine translation (SMT) system …
An Arabizi-English social media statistical machine translation system
J May, Y Benjira, A Echihabi - … of the 11th Conference of the …, 2014 - aclanthology.org
We present a machine translation engine that can translate romanized Arabic, often known
as Arabizi, into English. With such a system we can, for the first time, translate the massive …
as Arabizi, into English. With such a system we can, for the first time, translate the massive …
Bootstrapping transliteration with constrained discovery for low-resource languages
Generating the English transliteration of a name written in a foreign script is an important
and challenging step in multilingual knowledge acquisition and information extraction …
and challenging step in multilingual knowledge acquisition and information extraction …