Advancing RNN transducer technology for speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com

Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

被引用次数：338 相关文章所有 7 个版本

[PDF] ieee.org

End-to-end speech recognition: A survey

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org

In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

被引用次数：88 相关文章所有 6 个版本

[PDF] arxiv.org

Scaling end-to-end models for large-scale multilingual asr

B Li, R Pang, TN Sainath, A Gulati… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org

Building ASR models across many languages is a challenging multi-task learning problem
due to large variations and heavily unbalanced data. Existing work has shown positive …

被引用次数：72 相关文章所有 4 个版本

Massively multilingual asr: A lifelong learning solution

B Li, R Pang, Y Zhang, TN Sainath… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

The development of end-to-end models has largely sped up the research in massively
multilingual automatic speech recognition (MMASR). Previous research has demonstrated …

被引用次数：39 相关文章

[PDF] arxiv.org

Residual adapters for parameter-efficient ASR adaptation to atypical and accented speech

K Tomanek, V Zayats, D Padfield, K Vaillancourt… - arXiv preprint arXiv …, 2021 - arxiv.org

Automatic Speech Recognition (ASR) systems are often optimized to work best for speakers
with canonical speech patterns. Unfortunately, these systems perform poorly when tested on …

被引用次数：48 相关文章所有 6 个版本

[HTML] jmir.org

[HTML][HTML] Traditional machine learning models and bidirectional encoder representations from transformer (BERT)–based automatic classification of tweets about …

JA Benítez-Andrades, JM Alija-Pérez… - JMIR medical …, 2022 - medinform.jmir.org

Background Eating disorders affect an increasing number of people. Social networks
provide information that can help. Objective We aimed to find machine learning models …

被引用次数：34 相关文章所有 16 个版本

[PDF] arxiv.org

Diagonal state space augmented transformers for speech recognition

G Saon, A Gupta, X Cui - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org

We improve on the popular conformer architecture by replacing the depthwise temporal
convolutions with diagonal state space (DSS) models. DSS is a recently introduced variant …

被引用次数：23 相关文章所有 4 个版本

[PDF] arxiv.org

On the limit of english conversational speech recognition

Z Tüske, G Saon, B Kingsbury - arXiv preprint arXiv:2105.00982, 2021 - arxiv.org

In our previous work we demonstrated that a single headed attention encoder-decoder
model is able to reach state-of-the-art results in conversational speech recognition. In this …

被引用次数：51 相关文章所有 6 个版本

[PDF] arxiv.org

Large-scale streaming end-to-end speech translation with neural transducers

J Xue, P Wang, J Li, M Post, Y Gaur - arXiv preprint arXiv:2204.05352, 2022 - arxiv.org

Neural transducers have been widely used in automatic speech recognition (ASR). In this
paper, we introduce it to streaming end-to-end speech translation (ST), which aims to …

被引用次数：21 相关文章所有 5 个版本

[PDF] arxiv.org

Integrating text inputs for training and adapting rnn transducer asr models

S Thomas, B Kingsbury, G Saon… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Compared to hybrid automatic speech recognition (ASR) systems that use a modular
architecture in which each component can be in-dependently adapted to a new domain …

被引用次数：24 相关文章所有 4 个版本