Speech recognition using deep neural networks: A systematic review
Over the past decades, a tremendous amount of research has been done on the use of
machine learning for speech processing applications, especially speech recognition …
machine learning for speech processing applications, especially speech recognition …
Supervised speech separation based on deep learning: An overview
Speech separation is the task of separating target speech from background interference.
Traditionally, speech separation is studied as a signal processing problem. A more recent …
Traditionally, speech separation is studied as a signal processing problem. A more recent …
DCCRN: Deep complex convolution recurrent network for phase-aware speech enhancement
Speech enhancement has benefited from the success of deep learning in terms of
intelligibility and perceptual quality. Conventional time-frequency (TF) domain methods …
intelligibility and perceptual quality. Conventional time-frequency (TF) domain methods …
Deep learning for audio signal processing
Given the recent surge in developments of deep learning, this paper provides a review of the
state-of-the-art deep learning techniques for audio signal processing. Speech, music, and …
state-of-the-art deep learning techniques for audio signal processing. Speech, music, and …
Mead: A large-scale audio-visual dataset for emotional talking-face generation
The synthesis of natural emotional reactions is an essential criterion in vivid talking-face
video generation. This criterion is nevertheless seldom taken into consideration in previous …
video generation. This criterion is nevertheless seldom taken into consideration in previous …
Phasen: A phase-and-harmonics-aware speech enhancement network
Time-frequency (TF) domain masking is a mainstream approach for single-channel speech
enhancement. Recently, focuses have been put to phase prediction in addition to amplitude …
enhancement. Recently, focuses have been put to phase prediction in addition to amplitude …
Metricgan: Generative adversarial networks based black-box metric scores optimization for speech enhancement
Adversarial loss in a conditional generative adversarial network (GAN) is not designed to
directly optimize evaluation metrics of a target task, and thus, may not always guide the …
directly optimize evaluation metrics of a target task, and thus, may not always guide the …
A regression approach to speech enhancement based on deep neural networks
In contrast to the conventional minimum mean square error (MMSE)-based noise reduction
techniques, we propose a supervised method to enhance speech by means of finding a …
techniques, we propose a supervised method to enhance speech by means of finding a …
Deep learning for environmentally robust speech recognition: An overview of recent developments
Eliminating the negative effect of non-stationary environmental noise is a long-standing
research topic for automatic speech recognition but still remains an important challenge …
research topic for automatic speech recognition but still remains an important challenge …
Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks
Separation of speech embedded in non-stationary interference is a challenging problem that
has recently seen dramatic improvements using deep network-based methods. Previous …
has recently seen dramatic improvements using deep network-based methods. Previous …