[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

Ascend: A spontaneous chinese-english dataset for code-switching in multi-turn conversation

H Lovenia, S Cahyawijaya, GI Winata, P Xu… - arXiv preprint arXiv …, 2021 - arxiv.org
Code-switching is a speech phenomenon occurring when a speaker switches language
during a conversation. Despite the spontaneous nature of code-switching in conversational …

Improving multilingual and code-switching asr using large language model generated text

K Hu, TN Sainath, B Li, Y Zhang… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
We investigate using large language models (LLMs) to generate text-only training data for
improving multilingual and code-switching automatic speech recognition (ASR) through a …

Semisupervised Speech Data Extraction from Basque Parliament Sessions and Validation on Fully Bilingual Basque–Spanish ASR

M Penagarikano, A Varona, G Bordel… - Applied Sciences, 2023 - mdpi.com
In this paper, a semisupervised speech data extraction method is presented and applied to
create a new dataset designed for the development of fully bilingual Automatic Speech …

Leveraging phone mask training for phonetic-reduction-robust e2e uyghur speech recognition

G Ma, P Hu, J Kang, S Huang, H Huang - arXiv preprint arXiv:2204.00819, 2022 - arxiv.org
In Uyghur speech, consonant and vowel reduction are often encountered, especially in
spontaneous speech with high speech rate, which will cause a degradation of speech …

Text-Derived Language Identity Incorporation for End-to-End Code-Switching Speech Recognition

Q Wang, H Li - Proceedings of the 6th Workshop on …, 2023 - aclanthology.org
Recognizing code-switching (CS) speech often presents challenges for an automatic
speech recognition system (ASR) due to limited linguistic context in short monolingual …

Optimizing bilingual neural transducer with synthetic code-switching text generation

T Nguyen, N Tran, L Deng, TF da Silva… - arXiv preprint arXiv …, 2022 - arxiv.org
Code-switching describes the practice of using more than one language in the same
sentence. In this study, we investigate how to optimize a neural transducer based bilingual …

Bilingual end-to-end ASR with byte-level subwords

L Deng, R Hsiao, A Ghoshal - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
In this paper, we investigate how the output representation of an end-to-end neural network
affects multilingual automatic speech recognition (ASR). We study different representations …

Context Conditioning via Surrounding Predictions for Non-recurrent CTC Models

B Naowarat, C Piansaddhayanon… - IEEE …, 2023 - ieeexplore.ieee.org
Connectionist Temporal Classification (CTC) loss has become widely used in sequence
modeling tasks such as Automatic Speech Recognition (ASR) and Handwritten Text …

[PDF][PDF] Semisupervised training of a fully bilingual ASR system for Basque and Spanish

M Penagarikano, A Varona, G Bordel… - Proceedings of the …, 2022 - researchgate.net
Automatic speech recognition (ASR) of speech signals with code-switching (an abrupt
language change common in bilingual communities) typically requires spoken language …