- 学术资源搜索

[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com

Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

被引用次数：326 相关文章所有 7 个版本

[PDF] arxiv.org

Deep representation learning in speech processing: Challenges, recent advances, and future trends

S Latif, R Rana, S Khalifa, R Jurdak, J Qadir… - arXiv preprint arXiv …, 2020 - arxiv.org

Research on speech processing has traditionally considered the task of designing hand-
engineered acoustic features (feature engineering) as a separate distinct problem from the …

被引用次数：98 相关文章所有 3 个版本

[PDF] inaoep.mx

Automatic speech recognition: a survey

M Malik, MK Malik, K Mehmood… - Multimedia Tools and …, 2021 - Springer

Recently great strides have been made in the field of automatic speech recognition (ASR) by
using various deep learning techniques. In this study, we present a thorough comparison …

被引用次数：280 相关文章所有 8 个版本

[PDF] ieee.org

End-to-end speech recognition: A survey

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org

In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

被引用次数：80 相关文章所有 6 个版本

[PDF] arxiv.org

Wenet: Production oriented streaming and non-streaming end-to-end speech recognition toolkit

Z Yao, D Wu, X Wang, B Zhang, F Yu, C Yang… - arXiv preprint arXiv …, 2021 - arxiv.org

In this paper, we propose an open source, production first, and production ready speech
recognition toolkit called WeNet in which a new two-pass approach is implemented to unify …

被引用次数：221 相关文章所有 4 个版本

[PDF] arxiv.org

SeamlessM4T-Massively Multilingual & Multimodal Machine Translation

L Barrault, YA Chung, MC Meglioli, D Dale… - arXiv preprint arXiv …, 2023 - arxiv.org

What does it take to create the Babel Fish, a tool that can help individuals translate speech
between any two languages? While recent breakthroughs in text-based models have …

被引用次数：51 相关文章

[PDF] arxiv.org

A streaming on-device end-to-end model surpassing server-side conventional model quality and latency

TN Sainath, Y He, B Li, A Narayanan… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

Thus far, end-to-end (E2E) models have not been shown to outperform state-of-the-art
conventional models with respect to both quality, ie, word error rate (WER), and latency, ie …

被引用次数：221 相关文章所有 10 个版本

[PDF] arxiv.org

Wenet 2.0: More productive end-to-end speech recognition toolkit

B Zhang, D Wu, Z Peng, X Song, Z Yao, H Lv… - arXiv preprint arXiv …, 2022 - arxiv.org

Recently, we made available WeNet, a production-oriented end-to-end speech recognition
toolkit, which introduces a unified two-pass (U2) framework and a built-in runtime to address …

被引用次数：67 相关文章所有 2 个版本

[PDF] arxiv.org

A better and faster end-to-end model for streaming asr

B Li, A Gulati, J Yu, TN Sainath, CC Chiu… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

End-to-end (E2E) models have shown to outperform state-of-the-art conventional models for
streaming speech recognition [1] across many dimensions, including quality (as measured …

被引用次数：121 相关文章所有 4 个版本

[PDF] arxiv.org

On the comparison of popular end-to-end models for large scale speech recognition

J Li, Y Wu, Y Gaur, C Wang, R Zhao, S Liu - arXiv preprint arXiv …, 2020 - arxiv.org

Recently, there has been a strong push to transition from hybrid models to end-to-end (E2E)
models for automatic speech recognition. Currently, there are three promising E2E methods …

被引用次数：148 相关文章所有 10 个版本