Deep learning for environmentally robust speech recognition: An overview of recent developments
Eliminating the negative effect of non-stationary environmental noise is a long-standing
research topic for automatic speech recognition but still remains an important challenge …
research topic for automatic speech recognition but still remains an important challenge …
Far-field automatic speech recognition
The machine recognition of speech spoken at a distance from the microphones, known as
far-field automatic speech recognition (ASR), has received a significant increase in attention …
far-field automatic speech recognition (ASR), has received a significant increase in attention …
Internal language model estimation for domain-adaptive end-to-end speech recognition
The external language models (LM) integration remains a challenging task for end-to-end
(E2E) automatic speech recognition (ASR) which has no clear division between acoustic …
(E2E) automatic speech recognition (ASR) which has no clear division between acoustic …
FaSNet: Low-latency adaptive beamforming for multi-microphone audio processing
Beamforming has been extensively investigated for multi-channel audio processing tasks.
Recently, learning-based beamforming methods, sometimes called neural beamformers …
Recently, learning-based beamforming methods, sometimes called neural beamformers …
Neural spectrospatial filtering
As the most widely-used spatial filtering approach for multi-channel speech separation,
beamforming extracts the target speech signal arriving from a specific direction. An …
beamforming extracts the target speech signal arriving from a specific direction. An …
Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation
Domain mismatch between training and testing can lead to significant degradation in
performance in many machine learning scenarios. Unfortunately, this is not a rare situation …
performance in many machine learning scenarios. Unfortunately, this is not a rare situation …
LCANet: End-to-end lipreading with cascaded attention-CTC
Machine lipreading is a special type of automatic speech recognition (ASR) which
transcribes human speech by visually interpreting the movement of related face regions …
transcribes human speech by visually interpreting the movement of related face regions …
Speaker-invariant training via adversarial learning
We propose a novel adversarial multi-task learning scheme, aiming at actively curtailing the
inter-talker feature variability while maximizing its senone discriminability so as to enhance …
inter-talker feature variability while maximizing its senone discriminability so as to enhance …
Conditional teacher-student learning
The teacher-student (T/S) learning has been shown to be effective for a variety of problems
such as domain adaptation and model compression. One shortcoming of the T/S learning is …
such as domain adaptation and model compression. One shortcoming of the T/S learning is …
Internal language model training for domain-adaptive end-to-end speech recognition
The efficacy of external language model (LM) integration with existing end-to-end (E2E)
automatic speech recognition (ASR) systems can be improved significantly using the …
automatic speech recognition (ASR) systems can be improved significantly using the …