[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

End-to-end speech recognition: A survey

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

Scaling end-to-end models for large-scale multilingual asr

B Li, R Pang, TN Sainath, A Gulati… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
Building ASR models across many languages is a challenging multi-task learning problem
due to large variations and heavily unbalanced data. Existing work has shown positive …

Massively multilingual asr: A lifelong learning solution

B Li, R Pang, Y Zhang, TN Sainath… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
The development of end-to-end models has largely sped up the research in massively
multilingual automatic speech recognition (MMASR). Previous research has demonstrated …

Residual adapters for parameter-efficient ASR adaptation to atypical and accented speech

K Tomanek, V Zayats, D Padfield, K Vaillancourt… - arXiv preprint arXiv …, 2021 - arxiv.org
Automatic Speech Recognition (ASR) systems are often optimized to work best for speakers
with canonical speech patterns. Unfortunately, these systems perform poorly when tested on …

[HTML][HTML] Traditional machine learning models and bidirectional encoder representations from transformer (BERT)–based automatic classification of tweets about …

JA Benítez-Andrades, JM Alija-Pérez… - JMIR medical …, 2022 - medinform.jmir.org
Background Eating disorders affect an increasing number of people. Social networks
provide information that can help. Objective We aimed to find machine learning models …

Diagonal state space augmented transformers for speech recognition

G Saon, A Gupta, X Cui - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
We improve on the popular conformer architecture by replacing the depthwise temporal
convolutions with diagonal state space (DSS) models. DSS is a recently introduced variant …

On the limit of english conversational speech recognition

Z Tüske, G Saon, B Kingsbury - arXiv preprint arXiv:2105.00982, 2021 - arxiv.org
In our previous work we demonstrated that a single headed attention encoder-decoder
model is able to reach state-of-the-art results in conversational speech recognition. In this …

Large-scale streaming end-to-end speech translation with neural transducers

J Xue, P Wang, J Li, M Post, Y Gaur - arXiv preprint arXiv:2204.05352, 2022 - arxiv.org
Neural transducers have been widely used in automatic speech recognition (ASR). In this
paper, we introduce it to streaming end-to-end speech translation (ST), which aims to …

Integrating text inputs for training and adapting rnn transducer asr models

S Thomas, B Kingsbury, G Saon… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Compared to hybrid automatic speech recognition (ASR) systems that use a modular
architecture in which each component can be in-dependently adapted to a new domain …