Direct modeling of raw audio with dnns for wake word detection

Speech processing for digital home assistants: Combining signal processing with deep-learning techniques

R Haeb-Umbach, S Watanabe… - IEEE Signal …, 2019 - ieeexplore.ieee.org

Once a popular theme of futuristic science fiction or far-fetched technology forecasts, digital
home assistants with a spoken language interface have become a ubiquitous commodity …

被引用次数：174 相关文章所有 9 个版本

[PDF] arxiv.org

Personal VAD: Speaker-conditioned voice activity detection

S Ding, Q Wang, S Chang, L Wan… - arXiv preprint arXiv …, 2019 - arxiv.org

In this paper, we propose" personal VAD", a system to detect the voice activity of a target
speaker at the frame level. This system is useful for gating the inputs to a streaming on …

被引用次数：83 相关文章所有 7 个版本

[PDF] neurips.cc

Adversarial music: Real world audio adversary against wake-word detection system

J Li, S Qu, X Li, J Szurley, JZ Kolter… - Advances in Neural …, 2019 - proceedings.neurips.cc

Abstract Voice Assistants (VAs) such as Amazon Alexa or Google Assistant rely on wake-
word detection to respond to people's commands, which could potentially be vulnerable to …

被引用次数：76 相关文章所有 10 个版本

[PDF] arxiv.org

Small-footprint keyword spotting on raw audio data with sinc-convolutions

S Mittermaier, L Kürzinger… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

Keyword Spotting (KWS) enables speech-based user interaction on smart devices. Always-
on and battery-powered application scenarios for smart devices put constraints on hardware …

被引用次数：73 相关文章所有 4 个版本

[PDF] usenix.org

{KENKU}: Towards Efficient and Stealthy Black-box Adversarial Attacks against {ASR} Systems

X Wu, S Ma, C Shen, C Lin, Q Wang, Q Li… - 32nd USENIX Security …, 2023 - usenix.org

Prior researchers show that existing automatic speech recognition (ASR) systems are
vulnerable to adversarial examples. Most existing adversarial attacks against ASR systems …

被引用次数：5 相关文章所有 4 个版本

[PDF] amazon.science

Monophone-based background modeling for two-stage on-device wake word detection

M Wu, S Panchapagesan, M Sun, J Gu… - … , Speech and Signal …, 2018 - ieeexplore.ieee.org

Accurate on-device wake word detection is crucial to products with far-field voice control
such as the Amazon Echo. It is quite challenging to build a wake word system with both low …

被引用次数：87 相关文章所有 5 个版本

End-to-end streaming keyword spotting

R Alvarez, HJ Park - ICASSP 2019-2019 IEEE International …, 2019 - ieeexplore.ieee.org

We present a system for keyword spotting that, except for a front-end component for feature
generation, it is entirely contained in a deep neural network (DNN) model trained" end-to …

被引用次数：71 相关文章所有 3 个版本

[PDF] arxiv.org

Multi-task learning for speaker verification and voice trigger detection

S Sigtia, E Marchi, S Kajarekar, D Naik… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

Automatic speech transcription and speaker recognition are usually treated as separate
tasks even though they are interdependent. In this study, we investigate training a single …

被引用次数：55 相关文章所有 6 个版本

Hardware acceleration for embedded keyword spotting: Tutorial and survey

JSP Giraldo, M Verhelst - ACM Transactions on Embedded Computing …, 2021 - dl.acm.org

In recent years, Keyword Spotting (KWS) has become a crucial human–machine interface
for mobile devices, allowing users to interact more naturally with their gadgets by leveraging …

被引用次数：7 相关文章

[PDF] arxiv.org

Frequency domain multi-channel acoustic modeling for distant speech recognition

W Minhua, K Kumatani, S Sundaram… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org

Conventional far-field automatic speech recognition (ASR) systems typically employ
microphone array techniques for speech enhancement in order to improve robustness …

被引用次数：51 相关文章所有 10 个版本