Integrating an unsupervised transliteration model into statistical machine translation

Y Belinkov, N Durrani, F Dalvi, H Sajjad… - Computational …, 2020 - direct.mit.edu

Despite the recent success of deep neural networks in natural language processing and
other spheres of artificial intelligence, their interpretability remains a challenge. We analyze …

被引用次数：83 相关文章所有 7 个版本

[PDF] mdpi.com

A scenario-generic neural machine translation data augmentation method

X Liu, J He, M Liu, Z Yin, L Yin, W Zheng - Electronics, 2023 - mdpi.com

Amid the rapid advancement of neural machine translation, the challenge of data sparsity
has been a major obstacle. To address this issue, this study proposes a general data …

被引用次数：62 相关文章所有 3 个版本

[PDF] ed.ac.uk

[PDF][PDF] Neural machine translation of rare words with subword units

R Sennrich - arXiv preprint arXiv:1508.07909, 2015 - research.ed.ac.uk

Neural machine translation (NMT) models typically operate with a fixed vocabulary, but
translation is an open-vocabulary problem. Previous work addresses the translation of out-of …

被引用次数：9191 相关文章

[PDF] arxiv.org

What do neural machine translation models learn about morphology?

Y Belinkov, N Durrani, F Dalvi, H Sajjad… - arXiv preprint arXiv …, 2017 - arxiv.org

Neural machine translation (MT) models obtain state-of-the-art performance while
maintaining a simple, end-to-end architecture. However, little is known about what these …

被引用次数：470 相关文章所有 20 个版本

[PDF] aclanthology.org

[PDF][PDF] Farasa: A fast and furious segmenter for arabic

A Abdelali, K Darwish, N Durrani… - Proceedings of the 2016 …, 2016 - aclanthology.org

In this paper, we present Farasa, a fast and accurate Arabic segmenter. Our approach is
based on SVM-rank using linear kernels. We measure the performance of the segmenter in …

被引用次数：484 相关文章所有 3 个版本

[PDF] ed.ac.uk

N-gram counts and language models from the common crawl

C Buck, K Heafield, B Van Ooyen - Proceedings of the Language …, 2014 - research.ed.ac.uk

We contribute 5-gram counts and language models trained on the Common Crawl corpus, a
collection over 9 billion web pages. This release improves upon the Google n-gram counts …

被引用次数：238 相关文章所有 17 个版本

[PDF] arxiv.org

How grammatical is character-level neural machine translation? Assessing MT quality with contrastive translation pairs

R Sennrich - arXiv preprint arXiv:1612.04629, 2016 - arxiv.org

Analysing translation quality in regards to specific linguistic phenomena has historically
been difficult and time-consuming. Neural machine translation has the attractive property …

被引用次数：183 相关文章所有 9 个版本

[PDF] arxiv.org

When being unseen from mBERT is just the beginning: Handling new languages with multilingual language models

B Muller, A Anastasopoulos, B Sagot… - arXiv preprint arXiv …, 2020 - arxiv.org

Transfer learning based on pretraining language models on a large amount of raw data has
become a new norm to reach state-of-the-art performance in NLP. Still, it remains unclear …

被引用次数：143 相关文章所有 10 个版本

[PDF] aclanthology.org

[PDF][PDF] A comparative quality evaluation of PBSMT and NMT using professional translators

S Castilho, J Moorkens, F Gaspari… - … XVI: Research Track, 2017 - aclanthology.org

Interactive machine translation research has focused primarily on predictive typing, which
requires a human to type parts of the translation. This paper explores an interactive setting in …

被引用次数：133 相关文章所有 12 个版本

[PDF] aclanthology.org

Aksharantar: Open Indic-language transliteration datasets and models for the next billion users

Y Madhani, S Parthan, P Bedekar, G Nc… - Findings of the …, 2023 - aclanthology.org

Transliteration is very important in the Indian language context due to the usage of multiple
scripts and the widespread use of romanized inputs. However, few training and evaluation …

被引用次数：11 相关文章