Overview of the 8th workshop on Asian translation

R Dabre, C Chu, A Kunchukuttan - ACM Computing Surveys (CSUR), 2020 - dl.acm.org

We present a survey on multilingual neural machine translation (MNMT), which has gained
a lot of traction in recent years. MNMT has been useful in improving translation quality as a …

被引用次数：310 相关文章所有 9 个版本

[PDF] arxiv.org

Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing

T Kudo, J Richardson - arXiv preprint arXiv:1808.06226, 2018 - arxiv.org

This paper describes SentencePiece, a language-independent subword tokenizer and
detokenizer designed for Neural-based text processing, including Neural Machine …

被引用次数：3581 相关文章所有 8 个版本

[PDF] mit.edu

Survey of low-resource machine translation

B Haddow, R Bawden, AVM Barone, J Helcl… - Computational …, 2022 - direct.mit.edu

We present a survey covering the state of the art in low-resource machine translation (MT)
research. There are currently around 7,000 languages spoken in the world and almost all …

被引用次数：129 相关文章所有 13 个版本

[PDF] arxiv.org

Subword regularization: Improving neural network translation models with multiple subword candidates

T Kudo - arXiv preprint arXiv:1804.10959, 2018 - arxiv.org

Subword units are an effective way to alleviate the open vocabulary problems in neural
machine translation (NMT). While sentences are usually converted into unique subword …

被引用次数：1236 相关文章所有 4 个版本

[PDF] neurips.cc

Levenshtein transformer

J Gu, C Wang, J Zhao - Advances in neural information …, 2019 - proceedings.neurips.cc

Modern neural sequence generation models are built to either generate tokens step-by-step
from scratch or (iteratively) modify a sequence of tokens bounded by a fixed length. In this …

被引用次数：384 相关文章所有 8 个版本

[PDF] aclanthology.org

[PDF][PDF] Multilingual translation from denoising pre-training

Y Tang, C Tran, X Li, PJ Chen, N Goyal… - Findings of the …, 2021 - aclanthology.org

Recent work demonstrates the potential of training one model for multilingual machine
translation. In parallel, denoising pretraining using unlabeled monolingual data as a starting …

被引用次数：125 相关文章所有 2 个版本

[PDF] arxiv.org

CCMatrix: Mining billions of high-quality parallel sentences on the web

H Schwenk, G Wenzek, S Edunov, E Grave… - arXiv preprint arXiv …, 2019 - arxiv.org

We show that margin-based bitext mining in a multilingual sentence space can be applied to
monolingual corpora of billions of sentences. We are using ten snapshots of a curated …

被引用次数：206 相关文章所有 5 个版本

[HTML] mit.edu

Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages

G Ramesh, S Doddapaneni, A Bheemaraj… - Transactions of the …, 2022 - direct.mit.edu

We present Samanantar, the largest publicly available parallel corpora collection for Indic
languages. The collection contains a total of 49.7 million sentence pairs between English …

被引用次数：108 相关文章所有 11 个版本

[PDF] arxiv.org

A survey of domain adaptation for neural machine translation

C Chu, R Wang - arXiv preprint arXiv:1806.00258, 2018 - arxiv.org

Neural machine translation (NMT) is a deep learning based approach for machine
translation, which yields the state-of-the-art translation performance in scenarios where …

被引用次数：297 相关文章所有 3 个版本

[HTML] sciencedirect.com

[HTML][HTML] A voyage on neural machine translation for Indic languages

SK Sheshadri, D Gupta, MR Costa-Jussà - Procedia Computer Science, 2023 - Elsevier

With the invention of deep learning concepts, Machine Translation (MT) migrated towards
Neural Machine Translation (NMT) architectures, eventually from Statistical Machine …

被引用次数：10 相关文章所有 2 个版本