Deep learning for environmentally robust speech recognition: An overview of recent developments

Z Zhang, J Geiger, J Pohjalainen, AED Mousa… - ACM Transactions on …, 2018 - dl.acm.org
Eliminating the negative effect of non-stationary environmental noise is a long-standing
research topic for automatic speech recognition but still remains an important challenge …

Complex spectral mapping for single-and multi-channel speech enhancement and robust ASR

ZQ Wang, P Wang, DL Wang - IEEE/ACM transactions on …, 2020 - ieeexplore.ieee.org
This study proposes a complex spectral mapping approach for single-and multi-channel
speech enhancement, where deep neural networks (DNNs) are used to predict the real and …

NARA-WPE: A Python package for weighted prediction error dereverberation in Numpy and Tensorflow for online and offline processing

L Drude, J Heymann, C Boeddeker… - … 13th ITG-Symposium, 2018 - ieeexplore.ieee.org
NARA-WPE is a Python software package providing implementations of the weighted
prediction error (WPE) dereverberation algorithm. WPE has been shown to be a highly …

Beamnet: End-to-end training of a beamformer-supported multi-channel ASR system

J Heymann, L Drude, C Boeddeker… - … , Speech and Signal …, 2017 - ieeexplore.ieee.org
This paper presents an end-to-end training approach for a beamformer-supported multi-
channel ASR system. A neural network which estimates masks for a statistically optimum …

Unified architecture for multichannel end-to-end speech recognition with neural beamforming

T Ochiai, S Watanabe, T Hori… - IEEE Journal of …, 2017 - ieeexplore.ieee.org
This paper proposes a unified architecture for end-to-end automatic speech recognition
(ASR) to encompass microphone-array signal processing such as a state-of-the-art neural …

Audio-visual speech separation and dereverberation with a two-stage multimodal network

K Tan, Y Xu, SX Zhang, M Yu… - IEEE Journal of Selected …, 2020 - ieeexplore.ieee.org
Background noise, interfering speech and room reverberation frequently distort target
speech in real listening environments. In this study, we address joint speech separation and …

Bridging the gap between monaural speech enhancement and recognition with distortion-independent acoustic modeling

P Wang, K Tan - IEEE/ACM Transactions on Audio, Speech …, 2019 - ieeexplore.ieee.org
Monaural speech enhancement has made dramatic advances since the introduction of deep
learning a few years ago. Although enhanced speech has been demonstrated to have better …

Unsupervised training of a deep clustering model for multichannel blind source separation

L Drude, D Hasenklever… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
We propose a training scheme to train neural network-based source separation algorithms
from scratch when parallel clean data is unavailable. In particular, we demonstrate that an …

Dual application of speech enhancement for automatic speech recognition

A Pandey, C Liu, Y Wang… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org
In this work, we exploit speech enhancement for improving a re-current neural network
transducer (RNN-T) based ASR system. We employ a dense convolutional recurrent …

Integration of neural networks and probabilistic spatial models for acoustic blind source separation

L Drude, R Haeb-Umbach - IEEE Journal of Selected Topics in …, 2019 - ieeexplore.ieee.org
We formulate a generic framework for blind source separation (BSS), which allows
integrating data-driven spectro-temporal methods, such as deep clustering and deep …