[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

Adaptation algorithms for neural network-based speech recognition: An overview

P Bell, J Fainberg, O Klejch, J Li… - IEEE Open Journal …, 2020 - ieeexplore.ieee.org
We present a structured overview of adaptation algorithms for neural network-based speech
recognition, considering both hybrid hidden Markov model/neural network systems and end …

End-to-end speech recognition: A survey

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

Advanced long-context end-to-end speech recognition using context-expanded transformers

T Hori, N Moritz, C Hori, JL Roux - arXiv preprint arXiv:2104.09426, 2021 - arxiv.org
This paper addresses end-to-end automatic speech recognition (ASR) for long audio
recordings such as lecture and conversational speeches. Most end-to-end ASR models are …

Confidence score based speaker adaptation of conformer speech recognition systems

J Deng, X Xie, T Wang, M Cui, B Xue… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
Speaker adaptation techniques provide a powerful solution to customise automatic speech
recognition (ASR) systems for individual users. Practical application of unsupervised model …

[HTML][HTML] An experimental review of speaker diarization methods with application to two-speaker conversational telephone speech recordings

L Serafini, S Cornell, G Morrone, E Zovato… - Computer Speech & …, 2023 - Elsevier
We performed an experimental review of current diarization systems for the conversational
telephone speech (CTS) domain. In detail, we considered a total of eight different algorithms …

Attention-inspired artificial neural networks for speech processing: A systematic review

N Zacarias-Morales, P Pancardo… - Symmetry, 2021 - mdpi.com
Artificial Neural Networks (ANNs) were created inspired by the neural networks in the
human brain and have been widely applied in speech processing. The application areas of …

[PDF][PDF] Rapid Speaker Adaptation for Conformer Transducer: Attention and Bias Are All You Need.

Y Huang, G Ye, J Li, Y Gong - Interspeech, 2021 - isca-archive.org
Conformer transducer achieves new state-of-the-art end-to-end (E2E) system performance
and has become increasingly appealing for production. In this paper, we study how to …

[PDF][PDF] Transformer-Based Long-Context End-to-End Speech Recognition.

T Hori, N Moritz, C Hori, J Le Roux - Interspeech, 2020 - isca-archive.org
This paper presents an approach to long-context end-to-end automatic speech recognition
(ASR) using Transformers, aiming at improving ASR accuracy for long audio recordings …

AdaStreamLite: Environment-adaptive Streaming Speech Recognition on Mobile Devices

Y Wei, J Xiong, H Liu, Y Yu, J Pan, J Du - Proceedings of the ACM on …, 2024 - dl.acm.org
Streaming speech recognition aims to transcribe speech to text in a streaming manner,
providing real-time speech interaction for smartphone users. However, it is not trivial to …