Speech processing for digital home assistants: Combining signal processing with deep-learning techniques

R Haeb-Umbach, S Watanabe… - IEEE Signal …, 2019 - ieeexplore.ieee.org
Once a popular theme of futuristic science fiction or far-fetched technology forecasts, digital
home assistants with a spoken language interface have become a ubiquitous commodity …

Personal VAD: Speaker-conditioned voice activity detection

S Ding, Q Wang, S Chang, L Wan… - arXiv preprint arXiv …, 2019 - arxiv.org
In this paper, we propose" personal VAD", a system to detect the voice activity of a target
speaker at the frame level. This system is useful for gating the inputs to a streaming on …

Adversarial music: Real world audio adversary against wake-word detection system

J Li, S Qu, X Li, J Szurley, JZ Kolter… - Advances in Neural …, 2019 - proceedings.neurips.cc
Abstract Voice Assistants (VAs) such as Amazon Alexa or Google Assistant rely on wake-
word detection to respond to people's commands, which could potentially be vulnerable to …

Small-footprint keyword spotting on raw audio data with sinc-convolutions

S Mittermaier, L Kürzinger… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
Keyword Spotting (KWS) enables speech-based user interaction on smart devices. Always-
on and battery-powered application scenarios for smart devices put constraints on hardware …

{KENKU}: Towards Efficient and Stealthy Black-box Adversarial Attacks against {ASR} Systems

X Wu, S Ma, C Shen, C Lin, Q Wang, Q Li… - 32nd USENIX Security …, 2023 - usenix.org
Prior researchers show that existing automatic speech recognition (ASR) systems are
vulnerable to adversarial examples. Most existing adversarial attacks against ASR systems …

Monophone-based background modeling for two-stage on-device wake word detection

M Wu, S Panchapagesan, M Sun, J Gu… - … , Speech and Signal …, 2018 - ieeexplore.ieee.org
Accurate on-device wake word detection is crucial to products with far-field voice control
such as the Amazon Echo. It is quite challenging to build a wake word system with both low …

End-to-end streaming keyword spotting

R Alvarez, HJ Park - ICASSP 2019-2019 IEEE International …, 2019 - ieeexplore.ieee.org
We present a system for keyword spotting that, except for a front-end component for feature
generation, it is entirely contained in a deep neural network (DNN) model trained" end-to …

Multi-task learning for speaker verification and voice trigger detection

S Sigtia, E Marchi, S Kajarekar, D Naik… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
Automatic speech transcription and speaker recognition are usually treated as separate
tasks even though they are interdependent. In this study, we investigate training a single …

Hardware acceleration for embedded keyword spotting: Tutorial and survey

JSP Giraldo, M Verhelst - ACM Transactions on Embedded Computing …, 2021 - dl.acm.org
In recent years, Keyword Spotting (KWS) has become a crucial human–machine interface
for mobile devices, allowing users to interact more naturally with their gadgets by leveraging …

Frequency domain multi-channel acoustic modeling for distant speech recognition

W Minhua, K Kumatani, S Sundaram… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
Conventional far-field automatic speech recognition (ASR) systems typically employ
microphone array techniques for speech enhancement in order to improve robustness …