Sequence noise injected training for end-to-end speech recognition
We present a simple noise injection algorithm for training end-to-end ASR models which
consists in adding to the spectra of training utterances the scaled spectra of random …
consists in adding to the spectra of training utterances the scaled spectra of random …
[PDF][PDF] An Investigation of Mixup Training Strategies for Acoustic Models in ASR.
I Medennikov, YY Khokhlov, A Romanenko, D Popov… - Interspeech, 2018 - isca-archive.org
Mixup is a recently proposed technique that creates virtual training examples by combining
existing ones. It has been successfully used in various machine learning tasks. This paper …
existing ones. It has been successfully used in various machine learning tasks. This paper …
Transfer learning in speaker's age and gender recognition
M Markitantov - Speech and Computer: 22nd International Conference …, 2020 - Springer
In this paper, we study an application of transfer learning approach to speaker's age and
gender recognition task. Recently, speech analysis systems, which take images of log Mel …
gender recognition task. Recently, speech analysis systems, which take images of log Mel …
A multi-task learning framework for overcoming the catastrophic forgetting in automatic speech recognition
J Xue, J Han, T Zheng, X Gao, J Guo - arXiv preprint arXiv:1904.08039, 2019 - arxiv.org
Recently, data-driven based Automatic Speech Recognition (ASR) systems have achieved
state-of-the-art results. And transfer learning is often used when those existing systems are …
state-of-the-art results. And transfer learning is often used when those existing systems are …
Bayesian and gaussian process neural networks for large vocabulary continuous speech recognition
The hidden activation functions inside deep neural networks (DNNs) play a vital role in
learning high level discriminative features and controlling the information flows to track …
learning high level discriminative features and controlling the information flows to track …
Hard sample mining for the improved retraining of automatic speech recognition
J Xue, J Han, T Zheng, J Guo, B Wu - arXiv preprint arXiv:1904.08031, 2019 - arxiv.org
It is an effective way that improves the performance of the existing Automatic Speech
Recognition (ASR) systems by retraining with more and more new training data in the target …
Recognition (ASR) systems by retraining with more and more new training data in the target …
Investigation of stacked deep neural networks and mixture density networks for acoustic-to-articulatory inversion
Acoustic-to-articulatory inversion predicting articulatory movement based on the acoustic
signal is useful for many applications like talking head, speech recognition, and education …
signal is useful for many applications like talking head, speech recognition, and education …
Lecture Video Highlights Detection from Speech
M Song, I Aslan, E Parada-Cabaleiro… - 2024 32nd …, 2024 - ieeexplore.ieee.org
In interpersonal co-located and online teaching, lecturers highlight words and sentences in
their speech in order to implicitly communicate that particular content is important. This …
their speech in order to implicitly communicate that particular content is important. This …
Speaker-invariant psychological stress detection using attention-based network
When people get stressed in nervous or unfamiliar situations, their speaking styles or
acoustic characteristics change. These changes are particularly emphasized in certain …
acoustic characteristics change. These changes are particularly emphasized in certain …
Joint bottleneck feature and attention model for speech recognition
L Xingyan, Q Dan - Proceedings of 2018 International Conference on …, 2018 - dl.acm.org
Recently, attention based sequence-to-sequence model become a research hotspot in
speech recognition. The attention model has the problem of slow convergence and poor …
speech recognition. The attention model has the problem of slow convergence and poor …