Neural machine translation: A review
F Stahlberg - Journal of Artificial Intelligence Research, 2020 - jair.org
The field of machine translation (MT), the automatic translation of written text from one
natural language into another, has experienced a major paradigm shift in recent years …
natural language into another, has experienced a major paradigm shift in recent years …
[HTML][HTML] Deep learning in electron microscopy
JM Ede - Machine Learning: Science and Technology, 2021 - iopscience.iop.org
Deep learning is transforming most areas of science and technology, including electron
microscopy. This review paper offers a practical perspective aimed at developers with …
microscopy. This review paper offers a practical perspective aimed at developers with …
Masked language model scoring
Pretrained masked language models (MLMs) require finetuning for most NLP tasks. Instead,
we evaluate MLMs out of the box via their pseudo-log-likelihood scores (PLLs), which are …
we evaluate MLMs out of the box via their pseudo-log-likelihood scores (PLLs), which are …
Revisiting low-resource neural machine translation: A case study
R Sennrich, B Zhang - arXiv preprint arXiv:1905.11901, 2019 - arxiv.org
It has been shown that the performance of neural machine translation (NMT) drops starkly in
low-resource conditions, underperforming phrase-based statistical machine translation …
low-resource conditions, underperforming phrase-based statistical machine translation …
Root mean square layer normalization
B Zhang, R Sennrich - Advances in Neural Information …, 2019 - proceedings.neurips.cc
Layer normalization (LayerNorm) has been successfully applied to various deep neural
networks to help stabilize training and boost model convergence because of its capability in …
networks to help stabilize training and boost model convergence because of its capability in …
Transformers without tears: Improving the normalization of self-attention
We evaluate three simple, normalization-centric changes to improve Transformer training.
First, we show that pre-norm residual connections (PreNorm) and smaller initializations …
First, we show that pre-norm residual connections (PreNorm) and smaller initializations …
Understanding and improving lexical choice in non-autoregressive translation
Knowledge distillation (KD) is essential for training non-autoregressive translation (NAT)
models by reducing the complexity of the raw data with an autoregressive teacher model. In …
models by reducing the complexity of the raw data with an autoregressive teacher model. In …
Domain adaptation and multi-domain adaptation for neural machine translation: A survey
D Saunders - Journal of Artificial Intelligence Research, 2022 - jair.org
The development of deep learning techniques has allowed Neural Machine Translation
(NMT) models to become extremely powerful, given sufficient training data and training time …
(NMT) models to become extremely powerful, given sufficient training data and training time …
Self-attentional acoustic models
Self-attention is a method of encoding sequences of vectors by relating these vectors to
each-other based on pairwise similarities. These models have recently shown promising …
each-other based on pairwise similarities. These models have recently shown promising …
Variational graph normalized autoencoders
Link prediction is one of the key problems for graph-structured data. With the advancement
of graph neural networks, graph autoencoders (GAEs) and variational graph autoencoders …
of graph neural networks, graph autoencoders (GAEs) and variational graph autoencoders …