A comprehensive analysis of bilingual lexicon induction
A Irvine, C Callison-Burch - Computational Linguistics, 2017 - direct.mit.edu
Bilingual lexicon induction is the task of inducing word translations from monolingual
corpora in two languages. In this article we present the most comprehensive analysis of …
corpora in two languages. In this article we present the most comprehensive analysis of …
Measuring machine translation errors in new domains
We develop two techniques for analyzing the effect of porting a machine translation system
to a new domain. One is a macro-level analysis that measures how domain shift affects …
to a new domain. One is a macro-level analysis that measures how domain shift affects …
[PDF][PDF] Semi-supervised convolutional networks for translation adaptation with tiny amount of in-domain data
In this paper, we propose a method which uses semi-supervised convolutional neural
networks (CNNs) to select in-domain training data for statistical machine translation. This …
networks (CNNs) to select in-domain training data for statistical machine translation. This …
Text rewriting improves semantic role labeling
K Woodsend, M Lapata - Journal of Artificial Intelligence Research, 2014 - jair.org
Large-scale annotated corpora are a prerequisite to developing high-performance NLP
systems. Such corpora are expensive to produce, limited in size, often demanding linguistic …
systems. Such corpora are expensive to produce, limited in size, often demanding linguistic …
[PDF][PDF] Improving statistical machine translation with a multilingual paraphrase database
Abstract The multilingual Paraphrase Database (PPDB) is a freely available automatically
created resource of paraphrases in multiple languages. In statistical machine translation …
created resource of paraphrases in multiple languages. In statistical machine translation …
[PDF][PDF] Beyond parallel data: Joint word alignment and decipherment improves machine translation
Inspired by previous work, where decipherment is used to improve machine translation, we
propose a new idea to combine word alignment and decipherment into a single learning …
propose a new idea to combine word alignment and decipherment into a single learning …
[PDF][PDF] Unifying bayesian inference and vector space models for improved decipherment
We introduce into Bayesian decipherment a base distribution derived from similarities of
word embeddings. We use Dirichlet multinomial regression (Mimno and McCallum, 2012) to …
word embeddings. We use Dirichlet multinomial regression (Mimno and McCallum, 2012) to …
A survey of domain adaptation for statistical machine translation
H Cuong, K Sima'an - Machine Translation, 2017 - Springer
Differences in domains of language use between training data and test data have often
been reported to result in performance degradation for phrase-based machine translation …
been reported to result in performance degradation for phrase-based machine translation …
Paraphrasing out-of-vocabulary words with word embeddings and semantic lexicons for low resource statistical machine translation
C Chu, S Kurohashi - … of the Tenth International Conference on …, 2016 - aclanthology.org
Abstract Out-of-vocabulary (OOV) word is a crucial problem in statistical machine translation
(SMT) with low resources. OOV paraphrasing that augments the translation model for the …
(SMT) with low resources. OOV paraphrasing that augments the translation model for the …
[PDF][PDF] Using comparable corpora to adapt mt models to new domains
A Irvine, C Callison-Burch - Proceedings of the Ninth Workshop on …, 2014 - aclanthology.org
In previous work we showed that when using an SMT model trained on old-domain data to
translate text in a new-domain, most errors are due to unseen source words, unseen target …
translate text in a new-domain, most errors are due to unseen source words, unseen target …