Deep spoken keyword spotting: An overview

I López-Espejo, ZH Tan, JHL Hansen, J Jensen - IEEE Access, 2021 - ieeexplore.ieee.org
Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams
and has become a fast-growing technology thanks to the paradigm shift introduced by deep …

Training keyword spotting models on non-iid data with federated learning

A Hard, K Partridge, C Nguyen, N Subrahmanya… - arXiv preprint arXiv …, 2020 - arxiv.org
We demonstrate that a production-quality keyword-spotting model can be trained on-device
using federated learning and achieve comparable false accept and false reject rates to a …

Production federated keyword spotting via distillation, filtering, and joint federated-centralized training

A Hard, K Partridge, N Chen, S Augenstein… - arXiv preprint arXiv …, 2022 - arxiv.org
We trained a keyword spotting model using federated learning on real user devices and
observed significant improvements when the model was deployed for inference on phones …

The dku audio-visual wake word spotting system for the 2021 misp challenge

M Cheng, H Wang, Y Wang, M Li - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
This paper describes the system developed by the DKU team for the MISP Challenge 2021.
We present a two-stage approach consisting of end-to-end neural networks for the audio …

Noisy student-teacher training for robust keyword spotting

HJ Park, P Zhu, IL Moreno, N Subrahmanya - arXiv preprint arXiv …, 2021 - arxiv.org
We propose self-training with noisy student-teacher approach for streaming keyword
spotting, that can utilize large-scale unlabeled data and aggressive data augmentation. The …

Two-stage streaming keyword detection and localization with multi-scale depthwise temporal convolution

J Hou, L Xie, S Zhang - Neural Networks, 2022 - Elsevier
A keyword spotting (KWS) system running on smart devices should accurately detect the
appearances and predict the locations of predefined keywords from audio streams, with …

Speakerstew: Scaling to many languages with a triaged multilingual text-dependent and text-independent speaker verification system

R Chojnacka, J Pelecanos, Q Wang… - arXiv preprint arXiv …, 2021 - arxiv.org
In this paper, we describe SpeakerStew-a hybrid system to perform speaker verification on
46 languages. Two core ideas were explored in this system:(1) Pooling training data of …

Self-Learning for Personalized Keyword Spotting on Ultra-Low-Power Audio Sensors

M Rusci, F Paci, M Fariselli, E Flamand… - IEEE Internet of …, 2024 - ieeexplore.ieee.org
This paper proposes a self-learning method to incrementally train (fine-tune) a personalized
Keyword Spotting (KWS) model after the deployment on ultra-low power smart audio …

Utilizing TTS Synthesized Data for Efficient Development of Keyword Spotting Model

HJ Park, D Agarwal, N Chen, R Sun, K Partridge… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper explores the use of TTS synthesized training data for KWS (keyword spotting)
task while minimizing development cost and time. Keyword spotting models require a huge …

Locale Encoding for Scalable Multilingual Keyword Spotting Models

P Zhu, HJ Park, A Park, AS Scarpati… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
A Multilingual Keyword Spotting (KWS) system detects spoken keywords over multiple
locales. Conventional monolingual KWS approaches do not scale well to multilingual …