Making more of little data: Improving low-resource automatic speech recognition using data augmentation

M Bartelds, N San, B McDonnell, D Jurafsky… - arXiv preprint arXiv …, 2023 - arxiv.org
The performance of automatic speech recognition (ASR) systems has advanced
substantially in recent years, particularly for languages for which a large amount of …

Mitigating bias against non-native accents

Y Zhang, Y Zhang, BM Halpern, T Patel… - Proceedings of the …, 2022 - research.tudelft.nl
Automatic speech recognition (ASR) systems have seen substantial improvements in the
past decade; however, not for all speaker groups. Recent research shows that bias exists …

Accented speech recognition: Benchmarking, pre-training, and diverse data

A Aksënova, Z Chen, CC Chiu, D van Esch… - arXiv preprint arXiv …, 2022 - arxiv.org
Building inclusive speech recognition systems is a crucial step towards developing
technologies that speakers of all language varieties can use. Therefore, ASR systems must …

Voice conversion based augmentation and a hybrid CNN-LSTM model for improving speaker-independent keyword recognition on limited datasets

YA Wubet, KY Lian - IEEE Access, 2022 - ieeexplore.ieee.org
Keyword recognition is the basis of speech recognition, and its application is rapidly
increasing in keyword spotting, robotics, and smart home surveillance. Because of these …

Iteratively Improving Speech Recognition and Voice Conversion

MK Singh, N Takahashi, O Naoyuki - arXiv preprint arXiv:2305.15055, 2023 - arxiv.org
Many existing works on voice conversion (VC) tasks use automatic speech recognition
(ASR) models for ensuring linguistic consistency between source and converted samples …

Comparison of modern and traditional Slovak children's speech recognition

A Buday, J Juhár, A Čižmár… - 2023 World Symposium …, 2023 - ieeexplore.ieee.org
We compare two distinct speech recognition approaches, namely Hidden Markov models
mixed with deep neural networks and modern end-to-end neural speech recognition …

ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversion

E Casanova, C Shulby, A Korolev, AC Junior… - arXiv preprint arXiv …, 2022 - arxiv.org
We explore cross-lingual multi-speaker speech synthesis and cross-lingual voice
conversion applied to data augmentation for automatic speech recognition (ASR) systems in …

Enhancing Automatic Speech Recognition with Personalized Models: Improving Accuracy through Individualized Fine-tuning

V Brydinskyi, D Sabodashko, Y Khoma… - IEEE …, 2024 - ieeexplore.ieee.org
Automatic speech recognition (ASR) systems have become increasingly popular in recent
years due to their ability to convert spoken language into text. Nonetheless, despite their …

Non-Parallel Voice Conversion for ASR Augmentation

G Wang, A Rosenberg, B Ramabhadran… - arXiv preprint arXiv …, 2022 - arxiv.org
Automatic speech recognition (ASR) needs to be robust to speaker differences. Voice
Conversion (VC) modifies speaker characteristics of input speech. This is an attractive …

Improving child speech recognition with augmented child-like speech

Y Zhang, Z Yue, T Patel, O Scharenborg - arXiv preprint arXiv:2406.10284, 2024 - arxiv.org
State-of-the-art ASRs show suboptimal performance for child speech. The scarcity of child
speech limits the development of child speech recognition (CSR). Therefore, we studied …