From machine translation to code-switching: Generating high-quality code-switched text

I Tarunesh, S Kumar, P Jyothi - arXiv preprint arXiv:2107.06483, 2021 - arxiv.org
Generating code-switched text is a problem of growing interest, especially given the scarcity
of corpora containing large volumes of real code-switched text. In this work, we adapt a state …

Comet: Towards code-mixed translation using parallel monolingual sentences

D Gautam, P Kodali, K Gupta, A Goel… - Proceedings of the …, 2021 - aclanthology.org
Code-mixed languages are very popular in multilingual societies around the world, yet the
resources lag behind to enable robust systems on such languages. A major contributing …

Prompting multilingual large language models to generate code-mixed texts: The case of south east asian languages

ZX Yong, R Zhang, JZ Forde, S Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
While code-mixing is a common linguistic practice in many parts of the world, collecting high-
quality and low-cost code-mixed data remains a challenge for natural language processing …

Investigating lexical replacements for Arabic-English code-switched data augmentation

I Hamed, N Habash, S Abdennadher, NT Vu - arXiv preprint arXiv …, 2022 - arxiv.org
Data sparsity is a main problem hindering the development of code-switching (CS) NLP
systems. In this paper, we investigate data augmentation techniques for synthesizing …

X-RiSAWOZ: High-quality end-to-end multilingual dialogue datasets and few-shot agents

M Moradshahi, T Shen, K Bali, M Choudhury… - arXiv preprint arXiv …, 2023 - arxiv.org
Task-oriented dialogue research has mainly focused on a few popular languages like
English and Chinese, due to the high dataset creation cost for a new language. To reduce …

IndoRobusta: towards robustness against diverse code-mixed indonesian local languages

MF Adilazuarda, S Cahyawijaya, GI Winata… - arXiv preprint arXiv …, 2023 - arxiv.org
Significant progress has been made on Indonesian NLP. Nevertheless, exploration of the
code-mixing phenomenon in Indonesian is limited, despite many languages being …

Overview and results of MixMT shared-task at WMT 2022

V Srivastava, M Singh - … of the Seventh Conference on Machine …, 2022 - aclanthology.org
In this paper, we present an overview of the WMT 2022 shared task on code-mixed machine
translation (MixMT). In this shared task, we hosted two code-mixed machine translation …

CoCoa: An Encoder-Decoder Model for Controllable Code-switched Generation

S Mondal, S Pathak, P Jyothi… - Proceedings of the 2022 …, 2022 - aclanthology.org
Code-switching has seen growing interest in recent years as an important multilingual NLP
phenomenon. Generating code-switched text for data augmentation has been sufficiently …

Comparing grammatical theories of code-mixing

A Pratapa, M Choudhury - … of the Seventh Workshop on Noisy …, 2021 - aclanthology.org
Code-mixed text generation systems have found applications in many downstream tasks,
including speech recognition, translation and dialogue. A paradigm of these generation …

Exploring text-to-text transformers for english to hinglish machine translation with synthetic code-mixing

G Jawahar, EMB Nagoudi, M Abdul-Mageed… - arXiv preprint arXiv …, 2021 - arxiv.org
We describe models focused at the understudied problem of translating between
monolingual and code-mixed language pairs. More specifically, we offer a wide range of …