A streaming on-device end-to-end model surpassing server-side conventional model quality and latency

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com

Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

被引用次数：311 相关文章所有 7 个版本

[PDF] arxiv.org

Conformer: Convolution-augmented transformer for speech recognition

A Gulati, J Qin, CC Chiu, N Parmar, Y Zhang… - arXiv preprint arXiv …, 2020 - arxiv.org

Recently Transformer and Convolution neural network (CNN) based models have shown
promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural …

被引用次数：2736 相关文章所有 12 个版本

[PDF] mlr.press

Self-supervised learning with random-projection quantizer for speech recognition

CC Chiu, J Qin, Y Zhang, J Yu… - … Conference on Machine …, 2022 - proceedings.mlr.press

We present a simple and effective self-supervised learning approach for speech recognition.
The approach learns a model to predict the masked speech signals, in the form of discrete …

被引用次数：118 相关文章所有 5 个版本

[PDF] ieee.org

End-to-end speech recognition: A survey

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org

In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

被引用次数：64 相关文章所有 6 个版本

[PDF] arxiv.org

Contextnet: Improving convolutional neural networks for automatic speech recognition with global context

W Han, Z Zhang, Y Zhang, J Yu, CC Chiu, J Qin… - arXiv preprint arXiv …, 2020 - arxiv.org

Convolutional neural networks (CNN) have shown promising results for end-to-end speech
recognition, albeit still behind other state-of-the-art methods in performance. In this paper …

被引用次数：286 相关文章所有 10 个版本

[PDF] arxiv.org

Wenet 2.0: More productive end-to-end speech recognition toolkit

B Zhang, D Wu, Z Peng, X Song, Z Yao, H Lv… - arXiv preprint arXiv …, 2022 - arxiv.org

Recently, we made available WeNet, a production-oriented end-to-end speech recognition
toolkit, which introduces a unified two-pass (U2) framework and a built-in runtime to address …

被引用次数：63 相关文章所有 2 个版本

[PDF] arxiv.org

A better and faster end-to-end model for streaming asr

B Li, A Gulati, J Yu, TN Sainath, CC Chiu… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

End-to-end (E2E) models have shown to outperform state-of-the-art conventional models for
streaming speech recognition [1] across many dimensions, including quality (as measured …

被引用次数：117 相关文章所有 4 个版本

[PDF] arxiv.org

On the comparison of popular end-to-end models for large scale speech recognition

J Li, Y Wu, Y Gaur, C Wang, R Zhao, S Liu - arXiv preprint arXiv …, 2020 - arxiv.org

Recently, there has been a strong push to transition from hybrid models to end-to-end (E2E)
models for automatic speech recognition. Currently, there are three promising E2E methods …

被引用次数：145 相关文章所有 10 个版本

[PDF] arxiv.org

Internal language model estimation for domain-adaptive end-to-end speech recognition

Z Meng, S Parthasarathy, E Sun, Y Gaur… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org

The external language models (LM) integration remains a challenging task for end-to-end
(E2E) automatic speech recognition (ASR) which has no clear division between acoustic …

被引用次数：101 相关文章所有 5 个版本

[PDF] arxiv.org

Developing RNN-T models surpassing high-performance hybrid models with customization capability

J Li, R Zhao, Z Meng, Y Liu, W Wei… - arXiv preprint arXiv …, 2020 - arxiv.org

Because of its streaming nature, recurrent neural network transducer (RNN-T) is a very
promising end-to-end (E2E) model that may replace the popular hybrid model for automatic …

被引用次数：110 相关文章所有 11 个版本