A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Deep spoken keyword spotting: An overview

I López-Espejo, ZH Tan, JHL Hansen, J Jensen - IEEE Access, 2021 - ieeexplore.ieee.org
Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams
and has become a fast-growing technology thanks to the paradigm shift introduced by deep …

Ast: Audio spectrogram transformer

Y Gong, YA Chung, J Glass - arXiv preprint arXiv:2104.01778, 2021 - arxiv.org
In the past decade, convolutional neural networks (CNNs) have been widely adopted as the
main building block for end-to-end audio classification models, which aim to learn a direct …

Efficient neuromorphic signal processing with loihi 2

G Orchard, EP Frady, DBD Rubin… - … IEEE Workshop on …, 2021 - ieeexplore.ieee.org
The biologically inspired spiking neurons used in neuromorphic computing are nonlinear
filters with dynamic state variables—very different from the stateless neuron models used in …

Keyword transformer: A self-attention model for keyword spotting

A Berg, M O'Connor, MT Cruz - arXiv preprint arXiv:2104.00769, 2021 - arxiv.org
The Transformer architecture has been successful across many domains, including natural
language processing, computer vision and speech recognition. In keyword spotting, self …

Broadcasted residual learning for efficient keyword spotting

B Kim, S Chang, J Lee, D Sung - arXiv preprint arXiv:2106.04140, 2021 - arxiv.org
Keyword spotting is an important research field because it plays a key role in device wake-
up and user interaction on smart devices. However, it is challenging to minimize errors while …

Convmixer: Feature interactive convolution with curriculum learning for small footprint and noisy far-field keyword spotting

D Ng, Y Chen, B Tian, Q Fu… - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Building efficient architecture in neural speech processing is paramount to success in
keyword spotting deployment. However, it is very challenging for lightweight models to …

A surrogate gradient spiking baseline for speech command recognition

A Bittar, PN Garner - Frontiers in Neuroscience, 2022 - frontiersin.org
Artificial neural networks (ANNs) are the basis of recent advances in artificial intelligence
(AI); they typically use real valued neuron responses. By contrast, biological neurons are …

Wav2kws: Transfer learning from speech representations for keyword spotting

D Seo, HS Oh, Y Jung - IEEE Access, 2021 - ieeexplore.ieee.org
With the expanding development of on-device artificial intelligence, voice-enabled devices
such as smart speakers, wearables, and other on-device or edge processing systems have …

Learning efficient representations for keyword spotting with triplet loss

R Vygon, N Mikhaylovskiy - … 2021, St. Petersburg, Russia, September 27 …, 2021 - Springer
In the past few years, triplet loss-based metric embeddings have become a de-facto
standard for several important computer vision problems, most notably, person …