On the linguistic representational power of neural machine translation models
Despite the recent success of deep neural networks in natural language processing and
other spheres of artificial intelligence, their interpretability remains a challenge. We analyze …
other spheres of artificial intelligence, their interpretability remains a challenge. We analyze …
A scenario-generic neural machine translation data augmentation method
Amid the rapid advancement of neural machine translation, the challenge of data sparsity
has been a major obstacle. To address this issue, this study proposes a general data …
has been a major obstacle. To address this issue, this study proposes a general data …
[PDF][PDF] Neural machine translation of rare words with subword units
R Sennrich - arXiv preprint arXiv:1508.07909, 2015 - research.ed.ac.uk
Neural machine translation (NMT) models typically operate with a fixed vocabulary, but
translation is an open-vocabulary problem. Previous work addresses the translation of out-of …
translation is an open-vocabulary problem. Previous work addresses the translation of out-of …
What do neural machine translation models learn about morphology?
Neural machine translation (MT) models obtain state-of-the-art performance while
maintaining a simple, end-to-end architecture. However, little is known about what these …
maintaining a simple, end-to-end architecture. However, little is known about what these …
[PDF][PDF] Farasa: A fast and furious segmenter for arabic
In this paper, we present Farasa, a fast and accurate Arabic segmenter. Our approach is
based on SVM-rank using linear kernels. We measure the performance of the segmenter in …
based on SVM-rank using linear kernels. We measure the performance of the segmenter in …
N-gram counts and language models from the common crawl
C Buck, K Heafield, B Van Ooyen - Proceedings of the Language …, 2014 - research.ed.ac.uk
We contribute 5-gram counts and language models trained on the Common Crawl corpus, a
collection over 9 billion web pages. This release improves upon the Google n-gram counts …
collection over 9 billion web pages. This release improves upon the Google n-gram counts …
How grammatical is character-level neural machine translation? Assessing MT quality with contrastive translation pairs
R Sennrich - arXiv preprint arXiv:1612.04629, 2016 - arxiv.org
Analysing translation quality in regards to specific linguistic phenomena has historically
been difficult and time-consuming. Neural machine translation has the attractive property …
been difficult and time-consuming. Neural machine translation has the attractive property …
When being unseen from mBERT is just the beginning: Handling new languages with multilingual language models
Transfer learning based on pretraining language models on a large amount of raw data has
become a new norm to reach state-of-the-art performance in NLP. Still, it remains unclear …
become a new norm to reach state-of-the-art performance in NLP. Still, it remains unclear …
[PDF][PDF] A comparative quality evaluation of PBSMT and NMT using professional translators
S Castilho, J Moorkens, F Gaspari… - … XVI: Research Track, 2017 - aclanthology.org
Interactive machine translation research has focused primarily on predictive typing, which
requires a human to type parts of the translation. This paper explores an interactive setting in …
requires a human to type parts of the translation. This paper explores an interactive setting in …
Aksharantar: Open Indic-language transliteration datasets and models for the next billion users
Transliteration is very important in the Indian language context due to the usage of multiple
scripts and the widespread use of romanized inputs. However, few training and evaluation …
scripts and the widespread use of romanized inputs. However, few training and evaluation …