[PDF][PDF] Recent advances in end-to-end automatic speech recognition
J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
Automatic speech recognition using advanced deep learning approaches: A survey
Recent advancements in deep learning (DL) have posed a significant challenge for
automatic speech recognition (ASR). ASR relies on extensive training datasets, including …
automatic speech recognition (ASR). ASR relies on extensive training datasets, including …
Scaling speech technology to 1,000+ languages
Expanding the language coverage of speech technology has the potential to improve
access to information for many more people. However, current speech technology is …
access to information for many more people. However, current speech technology is …
Unsupervised cross-lingual representation learning for speech recognition
This paper presents XLSR which learns cross-lingual speech representations by pretraining
a single model from the raw waveform of speech in multiple languages. We build on …
a single model from the raw waveform of speech in multiple languages. We build on …
Deep transfer learning for automatic speech recognition: Towards better generalization
Automatic speech recognition (ASR) has recently become an important challenge when
using deep learning (DL). It requires large-scale training datasets and high computational …
using deep learning (DL). It requires large-scale training datasets and high computational …
Massively multilingual ASR: 50 languages, 1 model, 1 billion parameters
We study training a single acoustic model for multiple languages with the aim of improving
automatic speech recognition (ASR) performance on low-resource languages, and over-all …
automatic speech recognition (ASR) performance on low-resource languages, and over-all …
Large-scale multilingual speech recognition with a streaming end-to-end model
Multilingual end-to-end (E2E) models have shown great promise in expansion of automatic
speech recognition (ASR) coverage of the world's languages. They have shown …
speech recognition (ASR) coverage of the world's languages. They have shown …
Neural natural language generation: A survey on multilinguality, multimodality, controllability and learning
Developing artificial learning systems that can understand and generate natural language
has been one of the long-standing goals of artificial intelligence. Recent decades have …
has been one of the long-standing goals of artificial intelligence. Recent decades have …
Parp: Prune, adjust and re-prune for self-supervised speech recognition
Self-supervised speech representation learning (speech SSL) has demonstrated the benefit
of scale in learning rich representations for Automatic Speech Recognition (ASR) with …
of scale in learning rich representations for Automatic Speech Recognition (ASR) with …
Dinosr: Self-distillation and online clustering for self-supervised speech representation learning
In this paper, we introduce self-distillation and online clustering for self-supervised speech
representation learning (DinoSR) which combines masked language modeling, self …
representation learning (DinoSR) which combines masked language modeling, self …