The IWSLT 2016 evaluation campaign

S Yang, Y Wang, X Chu - arXiv preprint arXiv:2002.07526, 2020 - arxiv.org

In recent years, natural language processing (NLP) has got great development with deep
learning techniques. In the sub-field of machine translation, a new approach named Neural …

被引用次数：184 相关文章所有 3 个版本

[PDF] mit.edu

Multilingual denoising pre-training for neural machine translation

Y Liu, J Gu, N Goyal, X Li, S Edunov… - Transactions of the …, 2020 - direct.mit.edu

This paper demonstrates that multilingual denoising pre-training produces significant
performance gains across a wide variety of machine translation (MT) tasks. We present …

被引用次数：1690 相关文章所有 11 个版本

[PDF] mit.edu

On the linguistic representational power of neural machine translation models

Y Belinkov, N Durrani, F Dalvi, H Sajjad… - Computational …, 2020 - direct.mit.edu

Despite the recent success of deep neural networks in natural language processing and
other spheres of artificial intelligence, their interpretability remains a challenge. We analyze …

被引用次数：74 相关文章所有 7 个版本

[PDF] arxiv.org

Masked language model scoring

J Salazar, D Liang, TQ Nguyen, K Kirchhoff - arXiv preprint arXiv …, 2019 - arxiv.org

Pretrained masked language models (MLMs) require finetuning for most NLP tasks. Instead,
we evaluate MLMs out of the box via their pseudo-log-likelihood scores (PLLs), which are …

被引用次数：495 相关文章所有 7 个版本

[PDF] arxiv.org

The natural language decathlon: Multitask learning as question answering

B McCann, NS Keskar, C Xiong, R Socher - arXiv preprint arXiv …, 2018 - arxiv.org

Deep learning has improved performance on many natural language processing (NLP)
tasks individually. However, general NLP models cannot emerge within a paradigm that …

被引用次数：680 相关文章所有 4 个版本

[PDF] neurips.cc

Learned in translation: Contextualized word vectors

B McCann, J Bradbury, C Xiong… - Advances in neural …, 2017 - proceedings.neurips.cc

Computer vision has benefited from initializing multiple deep layers with weights pretrained
on large supervised training sets like ImageNet. Natural language processing (NLP) …

被引用次数：1251 相关文章所有 8 个版本

[PDF] neurips.cc

Understanding and improving layer normalization

J Xu, X Sun, Z Zhang, G Zhao… - Advances in neural …, 2019 - proceedings.neurips.cc

Layer normalization (LayerNorm) is a technique to normalize the distributions of
intermediate layers. It enables smoother gradients, faster training, and better generalization …

被引用次数：331 相关文章所有 9 个版本

[PDF] arxiv.org

Semi-supervised sequence modeling with cross-view training

K Clark, MT Luong, CD Manning, QV Le - arXiv preprint arXiv:1809.08370, 2018 - arxiv.org

Unsupervised representation learning algorithms such as word2vec and ELMo improve the
accuracy of many supervised NLP models, mainly because they can take advantage of large …

被引用次数：441 相关文章所有 10 个版本

[PDF] mlr.press

Graph optimal transport for cross-domain alignment

L Chen, Z Gan, Y Cheng, L Li… - … on Machine Learning, 2020 - proceedings.mlr.press

Cross-domain alignment between two sets of entities (eg, objects in an image, words in a
sentence) is fundamental to both computer vision and natural language processing. Existing …

被引用次数：174 相关文章所有 10 个版本

[PDF] arxiv.org

Document-level neural machine translation with hierarchical attention networks

L Miculicich, D Ram, N Pappas… - arXiv preprint arXiv …, 2018 - arxiv.org

Neural Machine Translation (NMT) can be improved by including document-level contextual
information. For this purpose, we propose a hierarchical attention model to capture the …

被引用次数：309 相关文章所有 10 个版本