Speech recognition using deep neural networks: A systematic review

AB Nassif, I Shahin, I Attili, M Azzeh, K Shaalan - IEEE access, 2019 - ieeexplore.ieee.org
Over the past decades, a tremendous amount of research has been done on the use of
machine learning for speech processing applications, especially speech recognition …

Supervised speech separation based on deep learning: An overview

DL Wang, J Chen - IEEE/ACM transactions on audio, speech …, 2018 - ieeexplore.ieee.org
Speech separation is the task of separating target speech from background interference.
Traditionally, speech separation is studied as a signal processing problem. A more recent …

DCCRN: Deep complex convolution recurrent network for phase-aware speech enhancement

Y Hu, Y Liu, S Lv, M Xing, S Zhang, Y Fu, J Wu… - arXiv preprint arXiv …, 2020 - arxiv.org
Speech enhancement has benefited from the success of deep learning in terms of
intelligibility and perceptual quality. Conventional time-frequency (TF) domain methods …

Deep learning for audio signal processing

H Purwins, B Li, T Virtanen, J Schlüter… - IEEE Journal of …, 2019 - ieeexplore.ieee.org
Given the recent surge in developments of deep learning, this paper provides a review of the
state-of-the-art deep learning techniques for audio signal processing. Speech, music, and …

Mead: A large-scale audio-visual dataset for emotional talking-face generation

K Wang, Q Wu, L Song, Z Yang, W Wu, C Qian… - … on Computer Vision, 2020 - Springer
The synthesis of natural emotional reactions is an essential criterion in vivid talking-face
video generation. This criterion is nevertheless seldom taken into consideration in previous …

Phasen: A phase-and-harmonics-aware speech enhancement network

D Yin, C Luo, Z Xiong, W Zeng - Proceedings of the AAAI Conference on …, 2020 - aaai.org
Time-frequency (TF) domain masking is a mainstream approach for single-channel speech
enhancement. Recently, focuses have been put to phase prediction in addition to amplitude …

Metricgan: Generative adversarial networks based black-box metric scores optimization for speech enhancement

SW Fu, CF Liao, Y Tsao, SD Lin - … Conference on Machine …, 2019 - proceedings.mlr.press
Adversarial loss in a conditional generative adversarial network (GAN) is not designed to
directly optimize evaluation metrics of a target task, and thus, may not always guide the …

A regression approach to speech enhancement based on deep neural networks

Y Xu, J Du, LR Dai, CH Lee - IEEE/ACM transactions on audio …, 2014 - ieeexplore.ieee.org
In contrast to the conventional minimum mean square error (MMSE)-based noise reduction
techniques, we propose a supervised method to enhance speech by means of finding a …

Deep learning for environmentally robust speech recognition: An overview of recent developments

Z Zhang, J Geiger, J Pohjalainen, AED Mousa… - ACM Transactions on …, 2018 - dl.acm.org
Eliminating the negative effect of non-stationary environmental noise is a long-standing
research topic for automatic speech recognition but still remains an important challenge …

Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks

H Erdogan, JR Hershey, S Watanabe… - … on Acoustics, Speech …, 2015 - ieeexplore.ieee.org
Separation of speech embedded in non-stationary interference is a challenging problem that
has recently seen dramatic improvements using deep network-based methods. Previous …