[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

Automatic speech recognition using advanced deep learning approaches: A survey

H Kheddar, M Hemis, Y Himeur - Information Fusion, 2024 - Elsevier
Recent advancements in deep learning (DL) have posed a significant challenge for
automatic speech recognition (ASR). ASR relies on extensive training datasets, including …

Scaling speech technology to 1,000+ languages

V Pratap, A Tjandra, B Shi, P Tomasello, A Babu… - Journal of Machine …, 2024 - jmlr.org
Expanding the language coverage of speech technology has the potential to improve
access to information for many more people. However, current speech technology is …

Unsupervised cross-lingual representation learning for speech recognition

A Conneau, A Baevski, R Collobert… - arXiv preprint arXiv …, 2020 - arxiv.org
This paper presents XLSR which learns cross-lingual speech representations by pretraining
a single model from the raw waveform of speech in multiple languages. We build on …

Deep transfer learning for automatic speech recognition: Towards better generalization

H Kheddar, Y Himeur, S Al-Maadeed, A Amira… - Knowledge-Based …, 2023 - Elsevier
Automatic speech recognition (ASR) has recently become an important challenge when
using deep learning (DL). It requires large-scale training datasets and high computational …

Massively multilingual ASR: 50 languages, 1 model, 1 billion parameters

V Pratap, A Sriram, P Tomasello, A Hannun… - arXiv preprint arXiv …, 2020 - arxiv.org
We study training a single acoustic model for multiple languages with the aim of improving
automatic speech recognition (ASR) performance on low-resource languages, and over-all …

Large-scale multilingual speech recognition with a streaming end-to-end model

A Kannan, A Datta, TN Sainath, E Weinstein… - arXiv preprint arXiv …, 2019 - arxiv.org
Multilingual end-to-end (E2E) models have shown great promise in expansion of automatic
speech recognition (ASR) coverage of the world's languages. They have shown …

Neural natural language generation: A survey on multilinguality, multimodality, controllability and learning

E Erdem, M Kuyu, S Yagcioglu, A Frank… - Journal of Artificial …, 2022 - jair.org
Developing artificial learning systems that can understand and generate natural language
has been one of the long-standing goals of artificial intelligence. Recent decades have …

Parp: Prune, adjust and re-prune for self-supervised speech recognition

CIJ Lai, Y Zhang, AH Liu, S Chang… - Advances in …, 2021 - proceedings.neurips.cc
Self-supervised speech representation learning (speech SSL) has demonstrated the benefit
of scale in learning rich representations for Automatic Speech Recognition (ASR) with …

Dinosr: Self-distillation and online clustering for self-supervised speech representation learning

AH Liu, HJ Chang, M Auli, WN Hsu… - Advances in Neural …, 2023 - proceedings.neurips.cc
In this paper, we introduce self-distillation and online clustering for self-supervised speech
representation learning (DinoSR) which combines masked language modeling, self …