Neural machine translation for low-resource languages: A survey

S Ranathunga, ESA Lee, M Prifti Skenduli… - ACM Computing …, 2023 - dl.acm.org
Neural Machine Translation (NMT) has seen tremendous growth in the last ten years since
the early 2000s and has already entered a mature phase. While considered the most widely …

A survey of data augmentation approaches for NLP

SY Feng, V Gangal, J Wei, S Chandar… - arXiv preprint arXiv …, 2021 - arxiv.org
Data augmentation has recently seen increased interest in NLP due to more work in low-
resource domains, new tasks, and the popularity of large-scale neural networks that require …

[HTML][HTML] Data augmentation techniques in natural language processing

LFAO Pellicer, TM Ferreira, AHR Costa - Applied Soft Computing, 2023 - Elsevier
Data Augmentation (DA) methods–a family of techniques designed for synthetic generation
of training data–have shown remarkable results in various Deep Learning and Machine …

Progressive transformers for end-to-end sign language production

B Saunders, NC Camgoz, R Bowden - … , Glasgow, UK, August 23–28, 2020 …, 2020 - Springer
The goal of automatic Sign Language Production (SLP) is to translate spoken language to a
continuous stream of sign language video at a level comparable to a human translator. If this …

A multilingual parallel corpora collection effort for Indian languages

S Siripragada, J Philip, VP Namboodiri… - arXiv preprint arXiv …, 2020 - arxiv.org
We present sentence aligned parallel corpora across 10 Indian Languages-Hindi, Telugu,
Tamil, Malayalam, Gujarati, Urdu, Bengali, Oriya, Marathi, Punjabi, and English-many of …

Diving deep into context-aware neural machine translation

J Huo, C Herold, Y Gao, L Dahlmann, S Khadivi… - arXiv preprint arXiv …, 2020 - arxiv.org
Context-aware neural machine translation (NMT) is a promising direction to improve the
translation quality by making use of the additional context, eg, document-level translation, or …

Product answer generation from heterogeneous sources: A new benchmark and best practices

X Shen, G Barlacchi, M Del Tredici… - Proceedings of the …, 2022 - aclanthology.org
It is of great value to answer product questions based on heterogeneous information
sources available on web product pages, eg, semi-structured attributes, text descriptions …

Best practices and lessons learned on synthetic data for language models

R Liu, J Wei, F Liu, C Si, Y Zhang, J Rao… - arXiv preprint arXiv …, 2024 - arxiv.org
The success of AI models relies on the availability of large, diverse, and high-quality
datasets, which can be challenging to obtain due to data scarcity, privacy concerns, and …

A survey of orthographic information in machine translation

BR Chakravarthi, P Rani, M Arcan, JP McCrae - SN computer science, 2021 - Springer
Abstract Machine translation is one of the applications of natural language processing which
has been explored in different languages. Recently researchers started paying attention …

Rethinking label smoothing on multi-hop question answering

Z Yin, Y Wang, X Hu, Y Wu, H Yan, X Zhang… - … National Conference on …, 2023 - Springer
Abstract Multi-Hop Question Answering (MHQA) is a significant area in question answering,
requiring multiple reasoning components, including document retrieval, supporting sentence …