Optimizing bilingual neural transducer with synthetic code-switching text generation

T Nguyen, N Tran, L Deng, TF da Silva… - arXiv preprint arXiv …, 2022 - arxiv.org
T Nguyen, N Tran, L Deng, TF da Silva, M Radzihovsky, R Hsiao, H Mason, S Braun
arXiv preprint arXiv:2210.12214, 2022arxiv.org
Code-switching describes the practice of using more than one language in the same
sentence. In this study, we investigate how to optimize a neural transducer based bilingual
automatic speech recognition (ASR) model for code-switching speech. Focusing on the
scenario where the ASR model is trained without supervised code-switching data, we found
that semi-supervised training and synthetic code-switched data can improve the bilingual
ASR system on code-switching speech. We analyze how each of the neural transducer's …
Code-switching describes the practice of using more than one language in the same sentence. In this study, we investigate how to optimize a neural transducer based bilingual automatic speech recognition (ASR) model for code-switching speech. Focusing on the scenario where the ASR model is trained without supervised code-switching data, we found that semi-supervised training and synthetic code-switched data can improve the bilingual ASR system on code-switching speech. We analyze how each of the neural transducer's encoders contributes towards code-switching performance by measuring encoder-specific recall values, and evaluate our English/Mandarin system on the ASCEND data set. Our final system achieves 25% mixed error rate (MER) on the ASCEND English/Mandarin code-switching test set -- reducing the MER by 2.1% absolute compared to the previous literature -- while maintaining good accuracy on the monolingual test sets.
arxiv.org
以上显示的是最相近的搜索结果。 查看全部搜索结果