Multimodal intelligence: Representation learning, information fusion, and applications

C Zhang, Z Yang, X He, L Deng - IEEE Journal of Selected …, 2020 - ieeexplore.ieee.org
Deep learning methods haverevolutionized speech recognition, image recognition, and
natural language processing since 2010. Each of these tasks involves a single modality in …

An overview of noise-robust automatic speech recognition

J Li, L Deng, Y Gong… - IEEE/ACM Transactions …, 2014 - ieeexplore.ieee.org
New waves of consumer-centric applications, such as voice search and voice interaction
with mobile devices and home entertainment systems, increasingly require automatic …

Aishell-1: An open-source mandarin speech corpus and a speech recognition baseline

H Bu, J Du, X Na, B Wu, H Zheng - … of the oriental chapter of the …, 2017 - ieeexplore.ieee.org
An open-source Mandarin speech corpus called AISHELL-1 is released. It is by far the
largest corpus which is suitable for conducting the speech recognition research and building …

Speaker adaptation of neural network acoustic models using i-vectors

G Saon, H Soltau, D Nahamoo… - 2013 IEEE Workshop on …, 2013 - ieeexplore.ieee.org
We propose to adapt deep neural network (DNN) acoustic models to a target speaker by
supplying speaker identity vectors (i-vectors) as input features to the network in parallel with …

[PDF][PDF] A time delay neural network architecture for efficient modeling of long temporal contexts.

V Peddinti, D Povey, S Khudanpur - Interspeech, 2015 - isca-archive.org
Recurrent neural network architectures have been shown to efficiently model long term
temporal dependencies between acoustic events. However the training time of recurrent …

Adaptation algorithms for neural network-based speech recognition: An overview

P Bell, J Fainberg, O Klejch, J Li… - IEEE Open Journal …, 2020 - ieeexplore.ieee.org
We present a structured overview of adaptation algorithms for neural network-based speech
recognition, considering both hybrid hidden Markov model/neural network systems and end …

Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models

P Swietojanski, S Renals - 2014 IEEE Spoken Language …, 2014 - ieeexplore.ieee.org
This paper proposes a simple yet effective model-based neural network speaker adaptation
technique that learns speaker-specific hidden unit contributions given adaptation data …

Learning hidden unit contributions for unsupervised acoustic model adaptation

P Swietojanski, J Li, S Renals - IEEE/ACM Transactions on …, 2016 - ieeexplore.ieee.org
This work presents a broad study on the adaptation of neural network acoustic models by
means of learning hidden unit contributions (LHUC)-a method that linearly re-combines …

Speaker adaptive training of deep neural network acoustic models using i-vectors

Y Miao, H Zhang, F Metze - IEEE/ACM Transactions on Audio …, 2015 - ieeexplore.ieee.org
In acoustic modeling, speaker adaptive training (SAT) has been a long-standing technique
for the traditional Gaussian mixture models (GMMs). Acoustic models trained with SAT …

Jhu aspire system: Robust lvcsr with tdnns, ivector adaptation and rnn-lms

V Peddinti, G Chen, V Manohar, T Ko… - … IEEE Workshop on …, 2015 - ieeexplore.ieee.org
Multi-style training, using data which emulates a variety of possible test scenarios, is a
popular approach towards robust acoustic modeling. However acoustic models capable of …