Synt++: Utilizing imperfect synthetic data to improve speech recognition
With recent advances in speech synthesis, synthetic data is becoming a viable alternative to
real data for training speech recognition models. However, machine learning with synthetic …
real data for training speech recognition models. However, machine learning with synthetic …
Streaming transformer for hardware efficient voice trigger detection and false trigger mitigation
We present a unified and hardware efficient architecture for two stage voice trigger detection
(VTD) and false trigger mitigation (FTM) tasks. Two stage VTD systems of voice assistants …
(VTD) and false trigger mitigation (FTM) tasks. Two stage VTD systems of voice assistants …
Text adaptive detection for customizable keyword spotting
Always-on keyword spotting (KWS), ie, wake word detection, has been widely used in many
voice assistant applications running on smart devices. Although fixed wakeup word …
voice assistant applications running on smart devices. Although fixed wakeup word …
[PDF][PDF] SiDi KWS: A Large-Scale Multilingual Dataset for Keyword Spotting.
MC Meneses, RB Holanda, LV Peres, GD Rocha - INTERSPEECH, 2022 - isca-archive.org
The remaining of this document is organized as follows: section 2 introduces the concept of
forced alignment, describes the framework Keyword Miner and details the use of that …
forced alignment, describes the framework Keyword Miner and details the use of that …
Streaming Anchor Loss: Augmenting Supervision with Temporal Significance
Streaming neural network models for fast frame-wise responses to various speech and
sensory signals are widely adopted on resource-constrained platforms. Hence, increasing …
sensory signals are widely adopted on resource-constrained platforms. Hence, increasing …
Marginalized beam search algorithms for hierarchical HMMs
X Xu, J Jaldén - IEEE Transactions on Signal Processing, 2024 - ieeexplore.ieee.org
Inferring a state sequence from a sequence of measurements is a fundamental problem in
bioinformatics and natural language processing. The Viterbi and the Beam Search (BS) …
bioinformatics and natural language processing. The Viterbi and the Beam Search (BS) …
A study of designing compact audio-visual wake word spotting system based on iterative fine-tuning in neural network pruning
Audio-only based wake word spotting (WWS) is challenging under noisy conditions due to
the environmental interference in signal transmission. In this paper, we investigate on …
the environmental interference in signal transmission. In this paper, we investigate on …
[HTML][HTML] SWAN: SubWord Alignment Network for HMM-free word timing estimation in end-to-end automatic speech recognition
Abstract End-to-end (E2E) automatic speech recognition (ASR) systems often exploited pre-
trained hidden Markov model (HMM) systems for word timing estimation (WTE), due to their …
trained hidden Markov model (HMM) systems for word timing estimation (WTE), due to their …
RepCNN: Micro-sized, Mighty Models for Wakeword Detection
Always-on machine learning models require a very low memory and compute footprint. Their
restricted parameter count limits the model's capacity to learn, and the effectiveness of the …
restricted parameter count limits the model's capacity to learn, and the effectiveness of the …
HEiMDaL: Highly Efficient Method for Detection and Localization of Wake-Words
Streaming keyword spotting is a widely used solution for activating voice assistants.
Methods based on Deep Neural Networks with Hidden Markov Model (DNN-HMM) have …
Methods based on Deep Neural Networks with Hidden Markov Model (DNN-HMM) have …