A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

Transformer transducer: A streamable speech recognition model with transformer encoders and rnn-t loss

Q Zhang, H Lu, H Sak, A Tripathi… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
In this paper we present an end-to-end speech recognition model with Transformer
encoders that can be used in a streaming speech recognition system. Transformer …

Streaming end-to-end speech recognition for mobile devices

Y He, TN Sainath, R Prabhavalkar… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
End-to-end (E2E) models, which directly predict output character sequences given input
speech, are good candidates for on-device speech recognition. E2E models, however …

Deep spoken keyword spotting: An overview

I López-Espejo, ZH Tan, JHL Hansen, J Jensen - IEEE Access, 2021 - ieeexplore.ieee.org
Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams
and has become a fast-growing technology thanks to the paradigm shift introduced by deep …

Deep context: end-to-end contextual speech recognition

G Pundak, TN Sainath, R Prabhavalkar… - 2018 IEEE spoken …, 2018 - ieeexplore.ieee.org
In automatic speech recognition (ASR) what a user says depends on the particular context
she is in. Typically, this context is represented as a set of word n-grams. In this work, we …

Contextual RNN-T for open domain ASR

M Jain, G Keren, J Mahadeokar, G Zweig… - arXiv preprint arXiv …, 2020 - arxiv.org
End-to-end (E2E) systems for automatic speech recognition (ASR), such as RNN
Transducer (RNN-T) and Listen-Attend-Spell (LAS) blend the individual components of a …

Attention-based end-to-end models for small-footprint keyword spotting

C Shan, J Zhang, Y Wang, L Xie - arXiv preprint arXiv:1803.10916, 2018 - arxiv.org
In this paper, we propose an attention-based end-to-end neural approach for small-footprint
keyword spotting (KWS), which aims to simplify the pipelines of building a production-quality …

Learning efficient representations for keyword spotting with triplet loss

R Vygon, N Mikhaylovskiy - … 2021, St. Petersburg, Russia, September 27 …, 2021 - Springer
In the past few years, triplet loss-based metric embeddings have become a de-facto
standard for several important computer vision problems, most notably, person …

[PDF][PDF] A systematic review on sequence-to-sequence learning with neural network and its models.

H Yousuf, M Lahzi, SA Salloum… - International Journal of …, 2021 - researchgate.net
We develop a precise writing survey on sequence-to-sequence learning with neural network
and its models. The primary aim of this report is to enhance the knowledge of the sequence …