A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

A review on the attention mechanism of deep learning

Z Niu, G Zhong, H Yu - Neurocomputing, 2021 - Elsevier
Attention has arguably become one of the most important concepts in the deep learning
field. It is inspired by the biological systems of humans that tend to focus on the distinctive …

End-to-end speech recognition: A survey

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

Stgat: Modeling spatial-temporal interactions for human trajectory prediction

Y Huang, H Bi, Z Li, T Mao… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
Human trajectory prediction is challenging and critical in various applications (eg,
autonomous vehicles and social robots). Because of the continuity and foresight of the …

Wenet: Production oriented streaming and non-streaming end-to-end speech recognition toolkit

Z Yao, D Wu, X Wang, B Zhang, F Yu, C Yang… - arXiv preprint arXiv …, 2021 - arxiv.org
In this paper, we propose an open source, production first, and production ready speech
recognition toolkit called WeNet in which a new two-pass approach is implemented to unify …

Deep learning for audio signal processing

H Purwins, B Li, T Virtanen, J Schlüter… - IEEE Journal of …, 2019 - ieeexplore.ieee.org
Given the recent surge in developments of deep learning, this paper provides a review of the
state-of-the-art deep learning techniques for audio signal processing. Speech, music, and …

State-of-the-art speech recognition with sequence-to-sequence models

CC Chiu, TN Sainath, Y Wu… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org
Attention-based encoder-decoder architectures such as Listen, Attend, and Spell (LAS),
subsume the acoustic, pronunciation and language model components of a traditional …

Developing real-time streaming transformer transducer for speech recognition on large-scale dataset

X Chen, Y Wu, Z Wang, S Liu… - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
Recently, Transformer based end-to-end models have achieved great success in many
areas including speech recognition. However, compared to LSTM models, the heavy …

Artificial intelligence in clinical and genomic diagnostics

R Dias, A Torkamani - Genome medicine, 2019 - Springer
Artificial intelligence (AI) is the development of computer systems that are able to perform
tasks that normally require human intelligence. Advances in AI software and hardware …