Sequence noise injected training for end-to-end speech recognition

G Saon, Z Tüske, K Audhkhasi… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
We present a simple noise injection algorithm for training end-to-end ASR models which
consists in adding to the spectra of training utterances the scaled spectra of random …

[PDF][PDF] An Investigation of Mixup Training Strategies for Acoustic Models in ASR.

I Medennikov, YY Khokhlov, A Romanenko, D Popov… - Interspeech, 2018 - isca-archive.org
Mixup is a recently proposed technique that creates virtual training examples by combining
existing ones. It has been successfully used in various machine learning tasks. This paper …

Transfer learning in speaker's age and gender recognition

M Markitantov - Speech and Computer: 22nd International Conference …, 2020 - Springer
In this paper, we study an application of transfer learning approach to speaker's age and
gender recognition task. Recently, speech analysis systems, which take images of log Mel …

A multi-task learning framework for overcoming the catastrophic forgetting in automatic speech recognition

J Xue, J Han, T Zheng, X Gao, J Guo - arXiv preprint arXiv:1904.08039, 2019 - arxiv.org
Recently, data-driven based Automatic Speech Recognition (ASR) systems have achieved
state-of-the-art results. And transfer learning is often used when those existing systems are …

Bayesian and gaussian process neural networks for large vocabulary continuous speech recognition

S Hu, MWY Lam, X Xie, S Liu, J Yu… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
The hidden activation functions inside deep neural networks (DNNs) play a vital role in
learning high level discriminative features and controlling the information flows to track …

Hard sample mining for the improved retraining of automatic speech recognition

J Xue, J Han, T Zheng, J Guo, B Wu - arXiv preprint arXiv:1904.08031, 2019 - arxiv.org
It is an effective way that improves the performance of the existing Automatic Speech
Recognition (ASR) systems by retraining with more and more new training data in the target …

Investigation of stacked deep neural networks and mixture density networks for acoustic-to-articulatory inversion

X Xie, X Liu, T Lee, L Wang - 2018 11th International …, 2018 - ieeexplore.ieee.org
Acoustic-to-articulatory inversion predicting articulatory movement based on the acoustic
signal is useful for many applications like talking head, speech recognition, and education …

Lecture Video Highlights Detection from Speech

M Song, I Aslan, E Parada-Cabaleiro… - 2024 32nd …, 2024 - ieeexplore.ieee.org
In interpersonal co-located and online teaching, lecturers highlight words and sentences in
their speech in order to implicitly communicate that particular content is important. This …

Speaker-invariant psychological stress detection using attention-based network

HK Shin, H Han, K Byun… - 2020 Asia-Pacific Signal …, 2020 - ieeexplore.ieee.org
When people get stressed in nervous or unfamiliar situations, their speaking styles or
acoustic characteristics change. These changes are particularly emphasized in certain …

Joint bottleneck feature and attention model for speech recognition

L Xingyan, Q Dan - Proceedings of 2018 International Conference on …, 2018 - dl.acm.org
Recently, attention based sequence-to-sequence model become a research hotspot in
speech recognition. The attention model has the problem of slow convergence and poor …