A data augmentation method for English-Vietnamese neural machine translation

NL Pham, TV Pham - IEEE Access, 2023 - ieeexplore.ieee.org
The translation quality of machine translation systems depends on the parallel corpus used
for training, particularly on the quantity and quality of the corpus. However, building a high …

Bridging the data gap between training and inference for unsupervised neural machine translation

Z He, X Wang, R Wang, S Shi, Z Tu - arXiv preprint arXiv:2203.08394, 2022 - arxiv.org
Back-translation is a critical component of Unsupervised Neural Machine Translation
(UNMT), which generates pseudo parallel data from target monolingual data. A UNMT …

Exploring All-In-One Knowledge Distillation Framework for Neural Machine Translation

Z Miao, W Zhang, J Su, X Li, J Luan… - Proceedings of the …, 2023 - aclanthology.org
Conventional knowledge distillation (KD) approaches are commonly employed to compress
neural machine translation (NMT) models. However, they only obtain one lightweight …

Refining low-resource unsupervised translation by language disentanglement of multilingual translation model

XP Nguyen, S Joty, K Wu… - Advances in Neural …, 2022 - proceedings.neurips.cc
Numerous recent work on unsupervised machine translation (UMT) implies that competent
unsupervised translations of low-resource and unrelated languages, such as Nepali or …

Latent constraints on unsupervised text-graph alignment with information asymmetry

J Tian, W Chen, Y Li, C Fan, H He, Y Jin - Proceedings of the AAAI …, 2023 - ojs.aaai.org
Unsupervised text-graph alignment (UTGA) is a fundamental task that bidirectionally
generates texts and graphs without parallel data. Most available models of UTGA suffer from …

Inquiries into Farmers' Perception of Biodiversity in Vietnam: A Systematic Analysis

TP Pham, NTK Chi, TA Truong… - Forum for Social …, 2023 - Taylor & Francis
Conserving biodiversity has become more important for tropical countries, where agricultural
production is featured by a large number of small farms scattered in wide areas conducting …

Ga-scs: Graph-augmented source code summarization

M Zhang, G Zhou, W Yu, N Huang, W Liu - ACM Transactions on Asian …, 2023 - dl.acm.org
Automatic source code summarization system aims to generate a valuable natural language
description for a program, which can facilitate software development and maintenance, code …

CantonMT: Cantonese to English NMT Platform with Fine-Tuned Models Using Synthetic Back-Translation Data

KY Hong, L Han, R Batista-Navarro… - arXiv preprint arXiv …, 2024 - arxiv.org
Neural Machine Translation (NMT) for low-resource languages is still a challenging task in
front of NLP researchers. In this work, we deploy a standard data augmentation …

BaSFormer: A Balanced Sparsity Regularized Attention Network for Transformer

S Jiang, Q Chen, Y Xiang, Y Pan… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Attention networks often make decisions relying solely on a few pieces of tokens, even if
those reliances are not truly indicative of the underlying meaning or intention of the full …

Augvic: Exploiting bitext vicinity for low-resource nmt

T Mohiuddin, MS Bari, S Joty - arXiv preprint arXiv:2106.05141, 2021 - arxiv.org
The success of Neural Machine Translation (NMT) largely depends on the availability of
large bitext training corpora. Due to the lack of such large corpora in low-resource language …