A review of deep learning techniques for speech processing
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …
learning. The use of multiple processing layers has enabled the creation of models capable …
Deep spoken keyword spotting: An overview
Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams
and has become a fast-growing technology thanks to the paradigm shift introduced by deep …
and has become a fast-growing technology thanks to the paradigm shift introduced by deep …
Ast: Audio spectrogram transformer
In the past decade, convolutional neural networks (CNNs) have been widely adopted as the
main building block for end-to-end audio classification models, which aim to learn a direct …
main building block for end-to-end audio classification models, which aim to learn a direct …
Efficient neuromorphic signal processing with loihi 2
The biologically inspired spiking neurons used in neuromorphic computing are nonlinear
filters with dynamic state variables—very different from the stateless neuron models used in …
filters with dynamic state variables—very different from the stateless neuron models used in …
Keyword transformer: A self-attention model for keyword spotting
The Transformer architecture has been successful across many domains, including natural
language processing, computer vision and speech recognition. In keyword spotting, self …
language processing, computer vision and speech recognition. In keyword spotting, self …
Broadcasted residual learning for efficient keyword spotting
Keyword spotting is an important research field because it plays a key role in device wake-
up and user interaction on smart devices. However, it is challenging to minimize errors while …
up and user interaction on smart devices. However, it is challenging to minimize errors while …
Convmixer: Feature interactive convolution with curriculum learning for small footprint and noisy far-field keyword spotting
Building efficient architecture in neural speech processing is paramount to success in
keyword spotting deployment. However, it is very challenging for lightweight models to …
keyword spotting deployment. However, it is very challenging for lightweight models to …
A surrogate gradient spiking baseline for speech command recognition
Artificial neural networks (ANNs) are the basis of recent advances in artificial intelligence
(AI); they typically use real valued neuron responses. By contrast, biological neurons are …
(AI); they typically use real valued neuron responses. By contrast, biological neurons are …
Wav2kws: Transfer learning from speech representations for keyword spotting
With the expanding development of on-device artificial intelligence, voice-enabled devices
such as smart speakers, wearables, and other on-device or edge processing systems have …
such as smart speakers, wearables, and other on-device or edge processing systems have …
Learning efficient representations for keyword spotting with triplet loss
R Vygon, N Mikhaylovskiy - … 2021, St. Petersburg, Russia, September 27 …, 2021 - Springer
In the past few years, triplet loss-based metric embeddings have become a de-facto
standard for several important computer vision problems, most notably, person …
standard for several important computer vision problems, most notably, person …