A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

[HTML][HTML] Deep learning classifiers for hyperspectral imaging: A review

ME Paoletti, JM Haut, J Plaza, A Plaza - ISPRS Journal of Photogrammetry …, 2019 - Elsevier
Advances in computing technology have fostered the development of new and powerful
deep learning (DL) techniques, which have demonstrated promising results in a wide range …

Voicebox: Text-guided multilingual universal speech generation at scale

M Le, A Vyas, B Shi, B Karrer, L Sari… - Advances in neural …, 2024 - proceedings.neurips.cc
Large-scale generative models such as GPT and DALL-E have revolutionized the research
community. These models not only generate high fidelity outputs, but are also generalists …

DCCRN: Deep complex convolution recurrent network for phase-aware speech enhancement

Y Hu, Y Liu, S Lv, M Xing, S Zhang, Y Fu, J Wu… - arXiv preprint arXiv …, 2020 - arxiv.org
Speech enhancement has benefited from the success of deep learning in terms of
intelligibility and perceptual quality. Conventional time-frequency (TF) domain methods …

Real time speech enhancement in the waveform domain

A Defossez, G Synnaeve, Y Adi - arXiv preprint arXiv:2006.12847, 2020 - arxiv.org
We present a causal speech enhancement model working on the raw waveform that runs in
real-time on a laptop CPU. The proposed model is based on an encoder-decoder …

Speech recognition using deep neural networks: A systematic review

AB Nassif, I Shahin, I Attili, M Azzeh, K Shaalan - IEEE access, 2019 - ieeexplore.ieee.org
Over the past decades, a tremendous amount of research has been done on the use of
machine learning for speech processing applications, especially speech recognition …

Conv-tasnet: Surpassing ideal time–frequency magnitude masking for speech separation

Y Luo, N Mesgarani - IEEE/ACM transactions on audio, speech …, 2019 - ieeexplore.ieee.org
Single-channel, speaker-independent speech separation methods have recently seen great
progress. However, the accuracy, latency, and computational cost of such methods remain …

Deep learning for audio signal processing

H Purwins, B Li, T Virtanen, J Schlüter… - IEEE Journal of …, 2019 - ieeexplore.ieee.org
Given the recent surge in developments of deep learning, this paper provides a review of the
state-of-the-art deep learning techniques for audio signal processing. Speech, music, and …

On mean absolute error for deep neural network based vector-to-vector regression

J Qi, J Du, SM Siniscalchi, X Ma… - IEEE Signal Processing …, 2020 - ieeexplore.ieee.org
In this paper, we exploit the properties of mean absolute error (MAE) as a loss function for
the deep neural network (DNN) based vector-to-vector regression. The goal of this work is …

The interspeech 2020 deep noise suppression challenge: Datasets, subjective testing framework, and challenge results

CKA Reddy, V Gopal, R Cutler, E Beyrami… - arXiv preprint arXiv …, 2020 - arxiv.org
The INTERSPEECH 2020 Deep Noise Suppression (DNS) Challenge is intended to
promote collaborative research in real-time single-channel Speech Enhancement aimed to …