A survey of deep learning techniques for neural machine translation

S Yang, Y Wang, X Chu - arXiv preprint arXiv:2002.07526, 2020 - arxiv.org
In recent years, natural language processing (NLP) has got great development with deep
learning techniques. In the sub-field of machine translation, a new approach named Neural …

Multilingual denoising pre-training for neural machine translation

Y Liu, J Gu, N Goyal, X Li, S Edunov… - Transactions of the …, 2020 - direct.mit.edu
This paper demonstrates that multilingual denoising pre-training produces significant
performance gains across a wide variety of machine translation (MT) tasks. We present …

On the linguistic representational power of neural machine translation models

Y Belinkov, N Durrani, F Dalvi, H Sajjad… - Computational …, 2020 - direct.mit.edu
Despite the recent success of deep neural networks in natural language processing and
other spheres of artificial intelligence, their interpretability remains a challenge. We analyze …

Masked language model scoring

J Salazar, D Liang, TQ Nguyen, K Kirchhoff - arXiv preprint arXiv …, 2019 - arxiv.org
Pretrained masked language models (MLMs) require finetuning for most NLP tasks. Instead,
we evaluate MLMs out of the box via their pseudo-log-likelihood scores (PLLs), which are …

The natural language decathlon: Multitask learning as question answering

B McCann, NS Keskar, C Xiong, R Socher - arXiv preprint arXiv …, 2018 - arxiv.org
Deep learning has improved performance on many natural language processing (NLP)
tasks individually. However, general NLP models cannot emerge within a paradigm that …

Learned in translation: Contextualized word vectors

B McCann, J Bradbury, C Xiong… - Advances in neural …, 2017 - proceedings.neurips.cc
Computer vision has benefited from initializing multiple deep layers with weights pretrained
on large supervised training sets like ImageNet. Natural language processing (NLP) …

Understanding and improving layer normalization

J Xu, X Sun, Z Zhang, G Zhao… - Advances in neural …, 2019 - proceedings.neurips.cc
Layer normalization (LayerNorm) is a technique to normalize the distributions of
intermediate layers. It enables smoother gradients, faster training, and better generalization …

Semi-supervised sequence modeling with cross-view training

K Clark, MT Luong, CD Manning, QV Le - arXiv preprint arXiv:1809.08370, 2018 - arxiv.org
Unsupervised representation learning algorithms such as word2vec and ELMo improve the
accuracy of many supervised NLP models, mainly because they can take advantage of large …

Graph optimal transport for cross-domain alignment

L Chen, Z Gan, Y Cheng, L Li… - … on Machine Learning, 2020 - proceedings.mlr.press
Cross-domain alignment between two sets of entities (eg, objects in an image, words in a
sentence) is fundamental to both computer vision and natural language processing. Existing …

Document-level neural machine translation with hierarchical attention networks

L Miculicich, D Ram, N Pappas… - arXiv preprint arXiv …, 2018 - arxiv.org
Neural Machine Translation (NMT) can be improved by including document-level contextual
information. For this purpose, we propose a hierarchical attention model to capture the …