Speech processing for digital home assistants: Combining signal processing with deep-learning techniques
R Haeb-Umbach, S Watanabe… - IEEE Signal …, 2019 - ieeexplore.ieee.org
Once a popular theme of futuristic science fiction or far-fetched technology forecasts, digital
home assistants with a spoken language interface have become a ubiquitous commodity …
home assistants with a spoken language interface have become a ubiquitous commodity …
Personal VAD: Speaker-conditioned voice activity detection
In this paper, we propose" personal VAD", a system to detect the voice activity of a target
speaker at the frame level. This system is useful for gating the inputs to a streaming on …
speaker at the frame level. This system is useful for gating the inputs to a streaming on …
Adversarial music: Real world audio adversary against wake-word detection system
Abstract Voice Assistants (VAs) such as Amazon Alexa or Google Assistant rely on wake-
word detection to respond to people's commands, which could potentially be vulnerable to …
word detection to respond to people's commands, which could potentially be vulnerable to …
Small-footprint keyword spotting on raw audio data with sinc-convolutions
S Mittermaier, L Kürzinger… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
Keyword Spotting (KWS) enables speech-based user interaction on smart devices. Always-
on and battery-powered application scenarios for smart devices put constraints on hardware …
on and battery-powered application scenarios for smart devices put constraints on hardware …
{KENKU}: Towards Efficient and Stealthy Black-box Adversarial Attacks against {ASR} Systems
Prior researchers show that existing automatic speech recognition (ASR) systems are
vulnerable to adversarial examples. Most existing adversarial attacks against ASR systems …
vulnerable to adversarial examples. Most existing adversarial attacks against ASR systems …
Monophone-based background modeling for two-stage on-device wake word detection
Accurate on-device wake word detection is crucial to products with far-field voice control
such as the Amazon Echo. It is quite challenging to build a wake word system with both low …
such as the Amazon Echo. It is quite challenging to build a wake word system with both low …
End-to-end streaming keyword spotting
We present a system for keyword spotting that, except for a front-end component for feature
generation, it is entirely contained in a deep neural network (DNN) model trained" end-to …
generation, it is entirely contained in a deep neural network (DNN) model trained" end-to …
Multi-task learning for speaker verification and voice trigger detection
Automatic speech transcription and speaker recognition are usually treated as separate
tasks even though they are interdependent. In this study, we investigate training a single …
tasks even though they are interdependent. In this study, we investigate training a single …
Hardware acceleration for embedded keyword spotting: Tutorial and survey
JSP Giraldo, M Verhelst - ACM Transactions on Embedded Computing …, 2021 - dl.acm.org
In recent years, Keyword Spotting (KWS) has become a crucial human–machine interface
for mobile devices, allowing users to interact more naturally with their gadgets by leveraging …
for mobile devices, allowing users to interact more naturally with their gadgets by leveraging …
Frequency domain multi-channel acoustic modeling for distant speech recognition
Conventional far-field automatic speech recognition (ASR) systems typically employ
microphone array techniques for speech enhancement in order to improve robustness …
microphone array techniques for speech enhancement in order to improve robustness …