Data diversification: A simple strategy for neural machine translation

SY Feng, V Gangal, J Wei, S Chandar… - arXiv preprint arXiv …, 2021 - arxiv.org

Data augmentation has recently seen increased interest in NLP due to more work in low-
resource domains, new tasks, and the popularity of large-scale neural networks that require …

被引用次数：863 相关文章所有 9 个版本

[PDF] neurips.cc

R-drop: Regularized dropout for neural networks

L Wu, J Li, Y Wang, Q Meng, T Qin… - Advances in …, 2021 - proceedings.neurips.cc

Dropout is a powerful and widely used technique to regularize the training of deep neural
networks. Though effective and performing well, the randomness introduced by dropout …

被引用次数：452 相关文章所有 9 个版本

[PDF] mdpi.com

A scenario-generic neural machine translation data augmentation method

X Liu, J He, M Liu, Z Yin, L Yin, W Zheng - Electronics, 2023 - mdpi.com

Amid the rapid advancement of neural machine translation, the challenge of data sparsity
has been a major obstacle. To address this issue, this study proposes a general data …

被引用次数：57 相关文章所有 3 个版本

[PDF] arxiv.org

A simple but tough-to-beat data augmentation approach for natural language understanding and generation

D Shen, M Zheng, Y Shen, Y Qu, W Chen - arXiv preprint arXiv …, 2020 - arxiv.org

Adversarial training has been shown effective at endowing the learned representations with
stronger generalization ability. However, it typically requires expensive computation to …

被引用次数：144 相关文章所有 2 个版本

[PDF] arxiv.org

Improving neural machine translation by bidirectional training

L Ding, D Wu, D Tao - arXiv preprint arXiv:2109.07780, 2021 - arxiv.org

We present a simple and effective pretraining strategy--bidirectional training (BiT) for neural
machine translation. Specifically, we bidirectionally update the model parameters at the …

被引用次数：58 相关文章所有 4 个版本

[PDF] arxiv.org

Rejuvenating low-frequency words: Making the most of parallel data in non-autoregressive translation

L Ding, L Wang, X Liu, DF Wong, D Tao… - arXiv preprint arXiv …, 2021 - arxiv.org

Knowledge distillation (KD) is commonly used to construct synthetic data for training non-
autoregressive translation (NAT) models. However, there exists a discrepancy on low …

被引用次数：64 相关文章所有 6 个版本

[PDF] arxiv.org

A survey on low-resource neural machine translation

R Wang, X Tan, R Luo, T Qin, TY Liu - arXiv preprint arXiv:2107.04239, 2021 - arxiv.org

Neural approaches have achieved state-of-the-art accuracy on machine translation but
suffer from the high cost of collecting large scale parallel data. Thus, a lot of research has …

被引用次数：51 相关文章所有 5 个版本

[PDF] arxiv.org

Learning to generalize to more: Continuous semantic augmentation for neural machine translation

X Wei, H Yu, Y Hu, R Weng, W Luo, J Xie… - arXiv preprint arXiv …, 2022 - arxiv.org

The principal task in supervised neural machine translation (NMT) is to learn to generate
target sentences conditioned on the source inputs from a set of parallel sentence pairs, and …

被引用次数：32 相关文章所有 4 个版本

[PDF] mit.edu

To augment or not to augment? A comparative study on text augmentation techniques for low-resource NLP

GG Şahin - Computational Linguistics, 2022 - direct.mit.edu

Data-hungry deep neural networks have established themselves as the de facto standard for
many NLP tasks, including the traditional sequence tagging ones. Despite their state-of-the …

被引用次数：34 相关文章所有 4 个版本

[PDF] mit.edu

Challenges of neural machine translation for short texts

Y Wan, B Yang, DF Wong, LS Chao, L Yao… - Computational …, 2022 - direct.mit.edu

Short texts (STs) present in a variety of scenarios, including query, dialog, and entity names.
Most of the exciting studies in neural machine translation (NMT) are focused on tackling …

被引用次数：26 相关文章所有 5 个版本