Statistical machine translation

A Lopez - ACM Computing Surveys (CSUR), 2008 - dl.acm.org
Statistical machine translation (SMT) treats the translation of natural language as a machine
learning problem. By examining many samples of human-produced translation, SMT …

[PDF][PDF] Farasa: A fast and furious segmenter for arabic

A Abdelali, K Darwish, N Durrani… - Proceedings of the 2016 …, 2016 - aclanthology.org
In this paper, we present Farasa, a fast and accurate Arabic segmenter. Our approach is
based on SVM-rank using linear kernels. We measure the performance of the segmenter in …

On the impact of various types of noise on neural machine translation

H Khayrallah, P Koehn - arXiv preprint arXiv:1805.12282, 2018 - arxiv.org
We examine how various types of noise in the parallel training data impact the quality of
neural machine translation systems. We create five types of artificial noise and analyze how …

A challenge set approach to evaluating machine translation

P Isabelle, C Cherry, G Foster - arXiv preprint arXiv:1704.07431, 2017 - arxiv.org
Neural machine translation represents an exciting leap forward in translation quality. But
what longstanding weaknesses does it resolve, and which remain? We address these …

N-gram counts and language models from the common crawl

C Buck, K Heafield, B Van Ooyen - Proceedings of the Language …, 2014 - research.ed.ac.uk
We contribute 5-gram counts and language models trained on the Common Crawl corpus, a
collection over 9 billion web pages. This release improves upon the Google n-gram counts …

cdec: A decoder, alignment, and learning framework for finite-state and context-free translation models

C Dyer, A Lopez, J Ganitkevitch, J Weese… - Proceedings of the …, 2010 - research.ed.ac.uk
We present cdec, an open source framework for decoding, aligning with, and training a
number of statistical machine translation models, including word-based models, phrase …

[PDF][PDF] Forest reranking: Discriminative parsing with non-local features

L Huang - Proceedings of ACL-08: HLT, 2008 - aclanthology.org
Conventional n-best reranking techniques often suffer from the limited scope of the nbest list,
which rules out many potentially good alternatives. We instead propose forest reranking, a …

Encoding source language with convolutional neural network for machine translation

F Meng, Z Lu, M Wang, H Li, W Jiang, Q Liu - arXiv preprint arXiv …, 2015 - arxiv.org
The recently proposed neural network joint model (NNJM)(Devlin et al., 2014) augments the
n-gram target language model with a heuristically chosen source context window, achieving …

[PDF][PDF] Forest-based translation rule extraction

H Mi, L Huang - Proceedings of the 2008 Conference on …, 2008 - aclanthology.org
Translation rule extraction is a fundamental problem in machine translation, especially for
linguistically syntax-based systems that need parse trees from either or both sides of the …

A global model for concept-to-text generation

I Konstas, M Lapata - Journal of Artificial Intelligence Research, 2013 - jair.org
Abstract Concept-to-text generation refers to the task of automatically producing textual
output from non-linguistic input. We present a joint model that captures content selection (" …