A comprehensive analysis of bilingual lexicon induction

A Irvine, C Callison-Burch - Computational Linguistics, 2017 - direct.mit.edu
Bilingual lexicon induction is the task of inducing word translations from monolingual
corpora in two languages. In this article we present the most comprehensive analysis of …

Measuring machine translation errors in new domains

A Irvine, J Morgan, M Carpuat, H Daumé III… - Transactions of the …, 2013 - direct.mit.edu
We develop two techniques for analyzing the effect of porting a machine translation system
to a new domain. One is a macro-level analysis that measures how domain shift affects …

[PDF][PDF] Semi-supervised convolutional networks for translation adaptation with tiny amount of in-domain data

B Chen, F Huang - Proceedings of The 20th SIGNLL Conference …, 2016 - aclanthology.org
In this paper, we propose a method which uses semi-supervised convolutional neural
networks (CNNs) to select in-domain training data for statistical machine translation. This …

Text rewriting improves semantic role labeling

K Woodsend, M Lapata - Journal of Artificial Intelligence Research, 2014 - jair.org
Large-scale annotated corpora are a prerequisite to developing high-performance NLP
systems. Such corpora are expensive to produce, limited in size, often demanding linguistic …

[PDF][PDF] Improving statistical machine translation with a multilingual paraphrase database

RM Seraj, M Siahbani, A Sarkar - Proceedings of the 2015 …, 2015 - aclanthology.org
Abstract The multilingual Paraphrase Database (PPDB) is a freely available automatically
created resource of paraphrases in multiple languages. In statistical machine translation …

[PDF][PDF] Beyond parallel data: Joint word alignment and decipherment improves machine translation

Q Dou, A Vaswani, K Knight - … of the 2014 Conference on Empirical …, 2014 - aclanthology.org
Inspired by previous work, where decipherment is used to improve machine translation, we
propose a new idea to combine word alignment and decipherment into a single learning …

[PDF][PDF] Unifying bayesian inference and vector space models for improved decipherment

Q Dou, A Vaswani, K Knight, C Dyer - Proceedings of the 53rd …, 2015 - aclanthology.org
We introduce into Bayesian decipherment a base distribution derived from similarities of
word embeddings. We use Dirichlet multinomial regression (Mimno and McCallum, 2012) to …

A survey of domain adaptation for statistical machine translation

H Cuong, K Sima'an - Machine Translation, 2017 - Springer
Differences in domains of language use between training data and test data have often
been reported to result in performance degradation for phrase-based machine translation …

Paraphrasing out-of-vocabulary words with word embeddings and semantic lexicons for low resource statistical machine translation

C Chu, S Kurohashi - … of the Tenth International Conference on …, 2016 - aclanthology.org
Abstract Out-of-vocabulary (OOV) word is a crucial problem in statistical machine translation
(SMT) with low resources. OOV paraphrasing that augments the translation model for the …

[PDF][PDF] Using comparable corpora to adapt mt models to new domains

A Irvine, C Callison-Burch - Proceedings of the Ninth Workshop on …, 2014 - aclanthology.org
In previous work we showed that when using an SMT model trained on old-domain data to
translate text in a new-domain, most errors are due to unseen source words, unseen target …