A Comparison of sequence-to-sequence models for speech recognition.

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

被引用次数：118 相关文章所有 6 个版本

[PDF] nowpublishers.com

[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com

Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

被引用次数：338 相关文章所有 7 个版本

A review on the attention mechanism of deep learning

Z Niu, G Zhong, H Yu - Neurocomputing, 2021 - Elsevier

Attention has arguably become one of the most important concepts in the deep learning
field. It is inspired by the biological systems of humans that tend to focus on the distinctive …

被引用次数：1779 相关文章所有 4 个版本

[PDF] ieee.org

End-to-end speech recognition: A survey

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org

In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

被引用次数：88 相关文章所有 6 个版本

[PDF] thecvf.com

Stgat: Modeling spatial-temporal interactions for human trajectory prediction

Y Huang, H Bi, Z Li, T Mao… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com

Human trajectory prediction is challenging and critical in various applications (eg,
autonomous vehicles and social robots). Because of the continuity and foresight of the …

被引用次数：567 相关文章所有 6 个版本

[PDF] arxiv.org

Wenet: Production oriented streaming and non-streaming end-to-end speech recognition toolkit

Z Yao, D Wu, X Wang, B Zhang, F Yu, C Yang… - arXiv preprint arXiv …, 2021 - arxiv.org

In this paper, we propose an open source, production first, and production ready speech
recognition toolkit called WeNet in which a new two-pass approach is implemented to unify …

被引用次数：227 相关文章所有 4 个版本

[PDF] arxiv.org

Deep learning for audio signal processing

H Purwins, B Li, T Virtanen, J Schlüter… - IEEE Journal of …, 2019 - ieeexplore.ieee.org

Given the recent surge in developments of deep learning, this paper provides a review of the
state-of-the-art deep learning techniques for audio signal processing. Speech, music, and …

被引用次数：827 相关文章所有 7 个版本

[PDF] arxiv.org

State-of-the-art speech recognition with sequence-to-sequence models

CC Chiu, TN Sainath, Y Wu… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org

Attention-based encoder-decoder architectures such as Listen, Attend, and Spell (LAS),
subsume the acoustic, pronunciation and language model components of a traditional …

被引用次数：1439 相关文章所有 10 个版本

[PDF] arxiv.org

Developing real-time streaming transformer transducer for speech recognition on large-scale dataset

X Chen, Y Wu, Z Wang, S Liu… - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org

Recently, Transformer based end-to-end models have achieved great success in many
areas including speech recognition. However, compared to LSTM models, the heavy …

被引用次数：189 相关文章所有 3 个版本

[PDF] springer.com

Artificial intelligence in clinical and genomic diagnostics

R Dias, A Torkamani - Genome medicine, 2019 - Springer

Artificial intelligence (AI) is the development of computer systems that are able to perform
tasks that normally require human intelligence. Advances in AI software and hardware …

被引用次数：350 相关文章所有 13 个版本