Deep learning for environmentally robust speech recognition: An overview of recent developments
Eliminating the negative effect of non-stationary environmental noise is a long-standing
research topic for automatic speech recognition but still remains an important challenge …
research topic for automatic speech recognition but still remains an important challenge …
Complex spectral mapping for single-and multi-channel speech enhancement and robust ASR
This study proposes a complex spectral mapping approach for single-and multi-channel
speech enhancement, where deep neural networks (DNNs) are used to predict the real and …
speech enhancement, where deep neural networks (DNNs) are used to predict the real and …
NARA-WPE: A Python package for weighted prediction error dereverberation in Numpy and Tensorflow for online and offline processing
NARA-WPE is a Python software package providing implementations of the weighted
prediction error (WPE) dereverberation algorithm. WPE has been shown to be a highly …
prediction error (WPE) dereverberation algorithm. WPE has been shown to be a highly …
Beamnet: End-to-end training of a beamformer-supported multi-channel ASR system
This paper presents an end-to-end training approach for a beamformer-supported multi-
channel ASR system. A neural network which estimates masks for a statistically optimum …
channel ASR system. A neural network which estimates masks for a statistically optimum …
Unified architecture for multichannel end-to-end speech recognition with neural beamforming
T Ochiai, S Watanabe, T Hori… - IEEE Journal of …, 2017 - ieeexplore.ieee.org
This paper proposes a unified architecture for end-to-end automatic speech recognition
(ASR) to encompass microphone-array signal processing such as a state-of-the-art neural …
(ASR) to encompass microphone-array signal processing such as a state-of-the-art neural …
Audio-visual speech separation and dereverberation with a two-stage multimodal network
Background noise, interfering speech and room reverberation frequently distort target
speech in real listening environments. In this study, we address joint speech separation and …
speech in real listening environments. In this study, we address joint speech separation and …
Bridging the gap between monaural speech enhancement and recognition with distortion-independent acoustic modeling
Monaural speech enhancement has made dramatic advances since the introduction of deep
learning a few years ago. Although enhanced speech has been demonstrated to have better …
learning a few years ago. Although enhanced speech has been demonstrated to have better …
Unsupervised training of a deep clustering model for multichannel blind source separation
L Drude, D Hasenklever… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
We propose a training scheme to train neural network-based source separation algorithms
from scratch when parallel clean data is unavailable. In particular, we demonstrate that an …
from scratch when parallel clean data is unavailable. In particular, we demonstrate that an …
Dual application of speech enhancement for automatic speech recognition
In this work, we exploit speech enhancement for improving a re-current neural network
transducer (RNN-T) based ASR system. We employ a dense convolutional recurrent …
transducer (RNN-T) based ASR system. We employ a dense convolutional recurrent …
Integration of neural networks and probabilistic spatial models for acoustic blind source separation
L Drude, R Haeb-Umbach - IEEE Journal of Selected Topics in …, 2019 - ieeexplore.ieee.org
We formulate a generic framework for blind source separation (BSS), which allows
integrating data-driven spectro-temporal methods, such as deep clustering and deep …
integrating data-driven spectro-temporal methods, such as deep clustering and deep …