[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

Adaptation algorithms for neural network-based speech recognition: An overview

P Bell, J Fainberg, O Klejch, J Li… - IEEE Open Journal …, 2020 - ieeexplore.ieee.org
We present a structured overview of adaptation algorithms for neural network-based speech
recognition, considering both hybrid hidden Markov model/neural network systems and end …

End-to-end speech recognition: A survey

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

Confidence score based speaker adaptation of conformer speech recognition systems

J Deng, X Xie, T Wang, M Cui, B Xue… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
Speaker adaptation techniques provide a powerful solution to customise automatic speech
recognition (ASR) systems for individual users. Practical application of unsupervised model …

Confidence score based conformer speaker adaptation for speech recognition

J Deng, X Xie, T Wang, M Cui, B Xue, Z Jin… - arXiv preprint arXiv …, 2022 - arxiv.org
A key challenge for automatic speech recognition (ASR) systems is to model the speaker
level variability. In this paper, compact speaker dependent learning hidden unit contributions …

Towards unsupervised learning of speech features in the wild

M Rivière, E Dupoux - 2021 IEEE Spoken Language …, 2021 - ieeexplore.ieee.org
Recent work on unsupervised contrastive learning of speech representation has shown
promising results, but so far has mostly been applied to clean, curated speech datasets. Can …

Speaker-aware speech-transformer

Z Fan, J Li, S Zhou, B Xu - 2019 IEEE Automatic Speech …, 2019 - ieeexplore.ieee.org
Recently, end-to-end (E2E) models become a competitive alternative to the conventional
hybrid automatic speech recognition (ASR) systems. However, they still suffer from speaker …

Investigating adaptation and transfer learning for end-to-end spoken language understanding from speech

N Tomashenko, A Caubriere, Y Estève - Interspeech 2019, 2019 - hal.science
This work investigates speaker adaptation and transfer learning for spoken language
understanding (SLU). We focus on the direct extraction of semantic tags from the audio …

Listen, attend, spell and adapt: Speaker adapted sequence-to-sequence asr

F Weninger, J Andrés-Ferrer, X Li, P Zhan - arXiv preprint arXiv …, 2019 - arxiv.org
Sequence-to-sequence (seq2seq) based ASR systems have shown state-of-the-art
performances while having clear advantages in terms of simplicity. However, comparisons …

Using personalized speech synthesis and neural language generator for rapid speaker adaptation

Y Huang, L He, W Wei, W Gale, J Li… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
We propose to use the personalized speech synthesis and the neural language generator to
synthesize content relevant personalized speech for rapid speaker adaptation. It has two …