Multimodal intelligence: Representation learning, information fusion, and applications
Deep learning methods haverevolutionized speech recognition, image recognition, and
natural language processing since 2010. Each of these tasks involves a single modality in …
natural language processing since 2010. Each of these tasks involves a single modality in …
An overview of noise-robust automatic speech recognition
New waves of consumer-centric applications, such as voice search and voice interaction
with mobile devices and home entertainment systems, increasingly require automatic …
with mobile devices and home entertainment systems, increasingly require automatic …
Aishell-1: An open-source mandarin speech corpus and a speech recognition baseline
An open-source Mandarin speech corpus called AISHELL-1 is released. It is by far the
largest corpus which is suitable for conducting the speech recognition research and building …
largest corpus which is suitable for conducting the speech recognition research and building …
Speaker adaptation of neural network acoustic models using i-vectors
We propose to adapt deep neural network (DNN) acoustic models to a target speaker by
supplying speaker identity vectors (i-vectors) as input features to the network in parallel with …
supplying speaker identity vectors (i-vectors) as input features to the network in parallel with …
[PDF][PDF] A time delay neural network architecture for efficient modeling of long temporal contexts.
Recurrent neural network architectures have been shown to efficiently model long term
temporal dependencies between acoustic events. However the training time of recurrent …
temporal dependencies between acoustic events. However the training time of recurrent …
Adaptation algorithms for neural network-based speech recognition: An overview
We present a structured overview of adaptation algorithms for neural network-based speech
recognition, considering both hybrid hidden Markov model/neural network systems and end …
recognition, considering both hybrid hidden Markov model/neural network systems and end …
Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models
P Swietojanski, S Renals - 2014 IEEE Spoken Language …, 2014 - ieeexplore.ieee.org
This paper proposes a simple yet effective model-based neural network speaker adaptation
technique that learns speaker-specific hidden unit contributions given adaptation data …
technique that learns speaker-specific hidden unit contributions given adaptation data …
Learning hidden unit contributions for unsupervised acoustic model adaptation
This work presents a broad study on the adaptation of neural network acoustic models by
means of learning hidden unit contributions (LHUC)-a method that linearly re-combines …
means of learning hidden unit contributions (LHUC)-a method that linearly re-combines …
Speaker adaptive training of deep neural network acoustic models using i-vectors
In acoustic modeling, speaker adaptive training (SAT) has been a long-standing technique
for the traditional Gaussian mixture models (GMMs). Acoustic models trained with SAT …
for the traditional Gaussian mixture models (GMMs). Acoustic models trained with SAT …
Jhu aspire system: Robust lvcsr with tdnns, ivector adaptation and rnn-lms
Multi-style training, using data which emulates a variety of possible test scenarios, is a
popular approach towards robust acoustic modeling. However acoustic models capable of …
popular approach towards robust acoustic modeling. However acoustic models capable of …