A review of deep learning techniques for speech processing
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …
learning. The use of multiple processing layers has enabled the creation of models capable …
[PDF][PDF] Recent advances in end-to-end automatic speech recognition
J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
Transformer transducer: A streamable speech recognition model with transformer encoders and rnn-t loss
In this paper we present an end-to-end speech recognition model with Transformer
encoders that can be used in a streaming speech recognition system. Transformer …
encoders that can be used in a streaming speech recognition system. Transformer …
Streaming end-to-end speech recognition for mobile devices
End-to-end (E2E) models, which directly predict output character sequences given input
speech, are good candidates for on-device speech recognition. E2E models, however …
speech, are good candidates for on-device speech recognition. E2E models, however …
Deep spoken keyword spotting: An overview
Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams
and has become a fast-growing technology thanks to the paradigm shift introduced by deep …
and has become a fast-growing technology thanks to the paradigm shift introduced by deep …
Deep context: end-to-end contextual speech recognition
In automatic speech recognition (ASR) what a user says depends on the particular context
she is in. Typically, this context is represented as a set of word n-grams. In this work, we …
she is in. Typically, this context is represented as a set of word n-grams. In this work, we …
Contextual RNN-T for open domain ASR
End-to-end (E2E) systems for automatic speech recognition (ASR), such as RNN
Transducer (RNN-T) and Listen-Attend-Spell (LAS) blend the individual components of a …
Transducer (RNN-T) and Listen-Attend-Spell (LAS) blend the individual components of a …
Attention-based end-to-end models for small-footprint keyword spotting
In this paper, we propose an attention-based end-to-end neural approach for small-footprint
keyword spotting (KWS), which aims to simplify the pipelines of building a production-quality …
keyword spotting (KWS), which aims to simplify the pipelines of building a production-quality …
Learning efficient representations for keyword spotting with triplet loss
R Vygon, N Mikhaylovskiy - … 2021, St. Petersburg, Russia, September 27 …, 2021 - Springer
In the past few years, triplet loss-based metric embeddings have become a de-facto
standard for several important computer vision problems, most notably, person …
standard for several important computer vision problems, most notably, person …
[PDF][PDF] A systematic review on sequence-to-sequence learning with neural network and its models.
H Yousuf, M Lahzi, SA Salloum… - International Journal of …, 2021 - researchgate.net
We develop a precise writing survey on sequence-to-sequence learning with neural network
and its models. The primary aim of this report is to enhance the knowledge of the sequence …
and its models. The primary aim of this report is to enhance the knowledge of the sequence …