End-to-end speech recognition: A survey

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

Bigssl: Exploring the frontier of large-scale semi-supervised learning for automatic speech recognition

Y Zhang, DS Park, W Han, J Qin… - IEEE Journal of …, 2022 - ieeexplore.ieee.org
We summarize the results of a host of efforts using giant automatic speech recognition (ASR)
models pre-trained using large, diverse unlabeled datasets containing approximately a …

An accurate and rapidly calibrating speech neuroprosthesis

NS Card, M Wairagkar, C Iacobacci… - … England Journal of …, 2024 - Mass Medical Soc
Background Brain–computer interfaces can enable communication for people with paralysis
by transforming cortical activity associated with attempted speech into text on a computer …

Diagonal state space augmented transformers for speech recognition

G Saon, A Gupta, X Cui - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
We improve on the popular conformer architecture by replacing the depthwise temporal
convolutions with diagonal state space (DSS) models. DSS is a recently introduced variant …

Speaker adaptation using spectro-temporal deep features for dysarthric and elderly speech recognition

M Geng, X Xie, Z Ye, T Wang, G Li, S Hu… - … on Audio, Speech …, 2022 - ieeexplore.ieee.org
Despite the rapid progress of automatic speech recognition (ASR) technologies targeting
normal speech in recent decades, accurate recognition of dysarthric and elderly speech …

VarArray: Array-geometry-agnostic continuous speech separation

T Yoshioka, X Wang, D Wang, M Tang… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Continuous speech separation using a microphone array was shown to be promising in
dealing with the speech overlap problem in natural conversation transcription. This paper …

Bayesian neural network language modeling for speech recognition

B Xue, S Hu, J Xu, M Geng, X Liu… - IEEE/ACM Transactions …, 2022 - ieeexplore.ieee.org
State-of-the-art neural network language models (NNLMs) represented by long short term
memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly …

Modular domain adaptation for conformer-based streaming asr

Q Li, B Li, D Hwang, TN Sainath… - arXiv preprint arXiv …, 2023 - arxiv.org
Speech data from different domains has distinct acoustic and linguistic characteristics. It is
common to train a single multidomain model such as a Conformer transducer for speech …

Confidence score based speaker adaptation of conformer speech recognition systems

J Deng, X Xie, T Wang, M Cui, B Xue… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
Speaker adaptation techniques provide a powerful solution to customise automatic speech
recognition (ASR) systems for individual users. Practical application of unsupervised model …

Efficient training of neural transducer for speech recognition

W Zhou, W Michel, R Schlüter, H Ney - arXiv preprint arXiv:2204.10586, 2022 - arxiv.org
As one of the most popular sequence-to-sequence modeling approaches for speech
recognition, the RNN-Transducer has achieved evolving performance with more and more …