Synt++: Utilizing imperfect synthetic data to improve speech recognition

TY Hu, M Armandpour, A Shrivastava… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
With recent advances in speech synthesis, synthetic data is becoming a viable alternative to
real data for training speech recognition models. However, machine learning with synthetic …

Streaming transformer for hardware efficient voice trigger detection and false trigger mitigation

V Garg, W Chang, S Sigtia, S Adya, P Simha… - arXiv preprint arXiv …, 2021 - arxiv.org
We present a unified and hardware efficient architecture for two stage voice trigger detection
(VTD) and false trigger mitigation (FTM) tasks. Two stage VTD systems of voice assistants …

Text adaptive detection for customizable keyword spotting

Y Xi, T Tan, W Zhang, B Yang… - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Always-on keyword spotting (KWS), ie, wake word detection, has been widely used in many
voice assistant applications running on smart devices. Although fixed wakeup word …

[PDF][PDF] SiDi KWS: A Large-Scale Multilingual Dataset for Keyword Spotting.

MC Meneses, RB Holanda, LV Peres, GD Rocha - INTERSPEECH, 2022 - isca-archive.org
The remaining of this document is organized as follows: section 2 introduces the concept of
forced alignment, describes the framework Keyword Miner and details the use of that …

Streaming Anchor Loss: Augmenting Supervision with Temporal Significance

UO Sarawgi, J Berkowitz, V Garg… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Streaming neural network models for fast frame-wise responses to various speech and
sensory signals are widely adopted on resource-constrained platforms. Hence, increasing …

Marginalized beam search algorithms for hierarchical HMMs

X Xu, J Jaldén - IEEE Transactions on Signal Processing, 2024 - ieeexplore.ieee.org
Inferring a state sequence from a sequence of measurements is a fundamental problem in
bioinformatics and natural language processing. The Viterbi and the Beam Search (BS) …

A study of designing compact audio-visual wake word spotting system based on iterative fine-tuning in neural network pruning

H Zhou, J Du, CHH Yang, S Xiong… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Audio-only based wake word spotting (WWS) is challenging under noisy conditions due to
the environmental interference in signal transmission. In this paper, we investigate on …

[HTML][HTML] SWAN: SubWord Alignment Network for HMM-free word timing estimation in end-to-end automatic speech recognition

W Kang, S Vishnubhotla, R Aktas, Y Virkar, R Peri… - 2024 - amazon.science
Abstract End-to-end (E2E) automatic speech recognition (ASR) systems often exploited pre-
trained hidden Markov model (HMM) systems for word timing estimation (WTE), due to their …

RepCNN: Micro-sized, Mighty Models for Wakeword Detection

A Kundu, P Nayak, P Padmanabhan, D Naik - arXiv preprint arXiv …, 2024 - arxiv.org
Always-on machine learning models require a very low memory and compute footprint. Their
restricted parameter count limits the model's capacity to learn, and the effectiveness of the …

HEiMDaL: Highly Efficient Method for Detection and Localization of Wake-Words

A Kundu, M Samragh, M Cho… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Streaming keyword spotting is a widely used solution for activating voice assistants.
Methods based on Deep Neural Networks with Hidden Markov Model (DNN-HMM) have …