Improving lexical choice in neural machine translation

F Stahlberg - Journal of Artificial Intelligence Research, 2020 - jair.org

The field of machine translation (MT), the automatic translation of written text from one
natural language into another, has experienced a major paradigm shift in recent years …

被引用次数：347 相关文章所有 7 个版本

[HTML] iop.org Full View

[HTML][HTML] Deep learning in electron microscopy

JM Ede - Machine Learning: Science and Technology, 2021 - iopscience.iop.org

Deep learning is transforming most areas of science and technology, including electron
microscopy. This review paper offers a practical perspective aimed at developers with …

被引用次数：104 相关文章所有 8 个版本

[PDF] arxiv.org

Masked language model scoring

J Salazar, D Liang, TQ Nguyen, K Kirchhoff - arXiv preprint arXiv …, 2019 - arxiv.org

Pretrained masked language models (MLMs) require finetuning for most NLP tasks. Instead,
we evaluate MLMs out of the box via their pseudo-log-likelihood scores (PLLs), which are …

被引用次数：469 相关文章所有 7 个版本

[PDF] arxiv.org

Revisiting low-resource neural machine translation: A case study

R Sennrich, B Zhang - arXiv preprint arXiv:1905.11901, 2019 - arxiv.org

It has been shown that the performance of neural machine translation (NMT) drops starkly in
low-resource conditions, underperforming phrase-based statistical machine translation …

被引用次数：273 相关文章所有 7 个版本

[PDF] neurips.cc

Root mean square layer normalization

B Zhang, R Sennrich - Advances in Neural Information …, 2019 - proceedings.neurips.cc

Layer normalization (LayerNorm) has been successfully applied to various deep neural
networks to help stabilize training and boost model convergence because of its capability in …

被引用次数：295 相关文章所有 15 个版本

[PDF] arxiv.org

Transformers without tears: Improving the normalization of self-attention

TQ Nguyen, J Salazar - arXiv preprint arXiv:1910.05895, 2019 - arxiv.org

We evaluate three simple, normalization-centric changes to improve Transformer training.
First, we show that pre-norm residual connections (PreNorm) and smaller initializations …

被引用次数：222 相关文章所有 6 个版本

[PDF] arxiv.org

Understanding and improving lexical choice in non-autoregressive translation

L Ding, L Wang, X Liu, DF Wong, D Tao… - arXiv preprint arXiv …, 2020 - arxiv.org

Knowledge distillation (KD) is essential for training non-autoregressive translation (NAT)
models by reducing the complexity of the raw data with an autoregressive teacher model. In …

被引用次数：111 相关文章所有 4 个版本

[PDF] jair.org Full View

Domain adaptation and multi-domain adaptation for neural machine translation: A survey

D Saunders - Journal of Artificial Intelligence Research, 2022 - jair.org

The development of deep learning techniques has allowed Neural Machine Translation
(NMT) models to become extremely powerful, given sufficient training data and training time …

被引用次数：84 相关文章所有 10 个版本

[PDF] arxiv.org

Self-attentional acoustic models

M Sperber, J Niehues, G Neubig, S Stüker… - arXiv preprint arXiv …, 2018 - arxiv.org

Self-attention is a method of encoding sequences of vectors by relating these vectors to
each-other based on pairwise similarities. These models have recently shown promising …

被引用次数：188 相关文章所有 10 个版本

[PDF] arxiv.org

Variational graph normalized autoencoders

SJ Ahn, MH Kim - Proceedings of the 30th ACM international conference …, 2021 - dl.acm.org

Link prediction is one of the key problems for graph-structured data. With the advancement
of graph neural networks, graph autoencoders (GAEs) and variational graph autoencoders …

被引用次数：65 相关文章所有 4 个版本