Phrase-based & neural unsupervised machine translation

G Lample, M Ott, A Conneau, L Denoyer… - arXiv preprint arXiv …, 2018 - arxiv.org
Machine translation systems achieve near human-level performance on some languages,
yet their effectiveness strongly relies on the availability of large amounts of parallel …

Cheap translation for cross-lingual named entity recognition

S Mayhew, CT Tsai, D Roth - … of the 2017 conference on empirical …, 2017 - aclanthology.org
Recent work in NLP has attempted to deal with low-resource languages but still assumed a
resource level that is not present for most languages, eg, the availability of Wikipedia in the …

Low-resource neural machine translation: Methods and trends

S Shi, X Wu, R Su, H Huang - ACM Transactions on Asian and Low …, 2022 - dl.acm.org
Neural Machine Translation (NMT) brings promising improvements in translation quality, but
until recently, these models rely on large-scale parallel corpora. As such corpora only exist …

Machine translation of low-resource spoken dialects: Strategies for normalizing Swiss German

PE Honnet, A Popescu-Belis, C Musat… - arXiv preprint arXiv …, 2017 - arxiv.org
The goal of this work is to design a machine translation (MT) system for a low-resource
family of dialects, collectively known as Swiss German, which are widely spoken in …

Flow-adapter architecture for unsupervised machine translation

Y Liu, H Jabbar, H Schütze - arXiv preprint arXiv:2204.12225, 2022 - arxiv.org
In this work, we propose a flow-adapter architecture for unsupervised NMT. It leverages
normalizing flows to explicitly model the distributions of sentence-level latent …

Bilingual lexical extraction based on word alignment for improving corpus search

J Andonovski, B Šandrih, O Kitanović - The Electronic Library, 2019 - emerald.com
Purpose This paper aims to describe the structure of an aligned Serbian-German literary
corpus (SrpNemKor) contained in a digital library Bibliša. The goal of the research was to …

Two approaches to compilation of bilingual multi-word terminology lists from lexical resources

B Šandrih, C Krstev, R Stanković - Natural Language Engineering, 2020 - cambridge.org
In this paper, we present two approaches and the implemented system for bilingual
terminology extraction that rely on an aligned bilingual domain corpus, a terminology …

Learning translations via matrix completion

D Wijaya, B Callahan, J Hewitt, J Gao, X Ling… - arXiv preprint arXiv …, 2024 - arxiv.org
Bilingual Lexicon Induction is the task of learning word translations without bilingual parallel
corpora. We model this task as a matrix completion problem, and present an effective and …

The Impact of Syntactic and Semantic Proximity on Machine Translation with Back-Translation

N Guerin, S Steinert-Threlkeld, E Chemla - arXiv preprint arXiv …, 2024 - arxiv.org
Unsupervised on-the-fly back-translation, in conjunction with multilingual pretraining, is the
dominant method for unsupervised neural machine translation. Theoretically, however, the …

[PDF][PDF] Round-trip training approach for bilingually low-resource statistical machine translation systems

B Ahmadnia, G Haffari, J Serrano - International Journal of Artificial …, 2019 - academia.edu
ABSTRACT Statistical Machine Translation (SMT) is making good progress in recent years.
Since SMT systems are based on data-driven approach, they learn from millions or even …