Improved Single System Conversational Telephone Speech Recognition with VGG Bottleneck Features.

G Saon, Z Tüske, K Audhkhasi… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org

We present a simple noise injection algorithm for training end-to-end ASR models which
consists in adding to the spectra of training utterances the scaled spectra of random …

被引用次数：41 相关文章所有 2 个版本

[PDF] isca-archive.org

[PDF][PDF] An Investigation of Mixup Training Strategies for Acoustic Models in ASR.

I Medennikov, YY Khokhlov, A Romanenko, D Popov… - Interspeech, 2018 - isca-archive.org

Mixup is a recently proposed technique that creates virtual training examples by combining
existing ones. It has been successfully used in various machine learning tasks. This paper …

被引用次数：33 相关文章所有 8 个版本

Transfer learning in speaker's age and gender recognition

M Markitantov - Speech and Computer: 22nd International Conference …, 2020 - Springer

In this paper, we study an application of transfer learning approach to speaker's age and
gender recognition task. Recently, speech analysis systems, which take images of log Mel …

被引用次数：16 相关文章所有 3 个版本

[PDF] arxiv.org

A multi-task learning framework for overcoming the catastrophic forgetting in automatic speech recognition

J Xue, J Han, T Zheng, X Gao, J Guo - arXiv preprint arXiv:1904.08039, 2019 - arxiv.org

Recently, data-driven based Automatic Speech Recognition (ASR) systems have achieved
state-of-the-art results. And transfer learning is often used when those existing systems are …

被引用次数：15 相关文章所有 3 个版本

[PDF] cuhk.edu.hk

Bayesian and gaussian process neural networks for large vocabulary continuous speech recognition

S Hu, MWY Lam, X Xie, S Liu, J Yu… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org

The hidden activation functions inside deep neural networks (DNNs) play a vital role in
learning high level discriminative features and controlling the information flows to track …

被引用次数：16 相关文章所有 3 个版本

[PDF] arxiv.org

Hard sample mining for the improved retraining of automatic speech recognition

J Xue, J Han, T Zheng, J Guo, B Wu - arXiv preprint arXiv:1904.08031, 2019 - arxiv.org

It is an effective way that improves the performance of the existing Automatic Speech
Recognition (ASR) systems by retraining with more and more new training data in the target …

被引用次数：11 相关文章所有 3 个版本

[PDF] researchgate.net

Investigation of stacked deep neural networks and mixture density networks for acoustic-to-articulatory inversion

X Xie, X Liu, T Lee, L Wang - 2018 11th International …, 2018 - ieeexplore.ieee.org

Acoustic-to-articulatory inversion predicting articulatory movement based on the acoustic
signal is useful for many applications like talking head, speech recognition, and education …

被引用次数：10 相关文章所有 2 个版本

[PDF] eurasip.org

Lecture Video Highlights Detection from Speech

M Song, I Aslan, E Parada-Cabaleiro… - 2024 32nd …, 2024 - ieeexplore.ieee.org

In interpersonal co-located and online teaching, lecturers highlight words and sentences in
their speech in order to implicitly communicate that particular content is important. This …

Speaker-invariant psychological stress detection using attention-based network

HK Shin, H Han, K Byun… - 2020 Asia-Pacific Signal …, 2020 - ieeexplore.ieee.org

When people get stressed in nervous or unfamiliar situations, their speaking styles or
acoustic characteristics change. These changes are particularly emphasized in certain …

被引用次数：8 相关文章所有 3 个版本

[PDF] acm.org

Joint bottleneck feature and attention model for speech recognition

L Xingyan, Q Dan - Proceedings of 2018 International Conference on …, 2018 - dl.acm.org

Recently, attention based sequence-to-sequence model become a research hotspot in
speech recognition. The attention model has the problem of slow convergence and poor …

被引用次数：5 相关文章