Generative adversarial networks for speech processing: A review

A Wali, Z Alamgir, S Karim, A Fawaz, MB Ali… - Computer Speech & …, 2022 - Elsevier
Generative adversarial networks (GANs) have seen remarkable progress in recent years.
They are used as generative models for all kinds of data such as text, images, audio, music …

Metricgan+: An improved version of metricgan for speech enhancement

SW Fu, C Yu, TA Hsieh, P Plantinga… - arXiv preprint arXiv …, 2021 - arxiv.org
The discrepancy between the cost function used for training a speech enhancement model
and human auditory perception usually makes the quality of enhanced speech …

Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

P Ochieng - Artificial Intelligence Review, 2023 - Springer
Deep neural networks (DNN) techniques have become pervasive in domains such as
natural language processing and computer vision. They have achieved great success in …

[PDF][PDF] SE-Conformer: Time-Domain Speech Enhancement Using Conformer.

E Kim, H Seo - Interspeech, 2021 - isca-archive.org
Convolution-augmented transformer (conformer) has recently shown competitive results in
speech-domain applications, such as automatic speech recognition, continuous speech …

FRCRN: Boosting feature representation using frequency recurrence for monaural speech enhancement

S Zhao, B Ma, KN Watcharasupat… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Convolutional recurrent networks (CRN) integrating a convolutional encoder-decoder (CED)
structure and a recurrent structure have achieved promising performance for monaural …

Audio-visual speech codecs: Rethinking audio-visual speech enhancement by re-synthesis

K Yang, D Marković, S Krenn… - Proceedings of the …, 2022 - openaccess.thecvf.com
Since facial actions such as lip movements contain significant information about speech
content, it is not surprising that audio-visual speech enhancement methods are more …

Av-rir: Audio-visual room impulse response estimation

A Ratnarajah, S Ghosh, S Kumar… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Accurate estimation of Room Impulse Response (RIR) which captures an
environment's acoustic properties is important for speech processing and AR/VR …

[HTML][HTML] Generative models for sound field reconstruction

E Fernandez-Grande, X Karakonstantis… - The Journal of the …, 2023 - pubs.aip.org
This work examines the use of generative adversarial networks for reconstructing sound
fields from experimental data. It is investigated whether generative models, which learn the …

CDPAM: Contrastive learning for perceptual audio similarity

P Manocha, Z Jin, R Zhang… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
Many speech processing methods based on deep learning require an automatic and
differentiable audio metric for the loss function. The DPAM approach of Manocha et al.[1] …

Improving perceptual quality by phone-fortified perceptual loss using wasserstein distance for speech enhancement

TA Hsieh, C Yu, SW Fu, X Lu, Y Tsao - arXiv preprint arXiv:2010.15174, 2020 - arxiv.org
Speech enhancement (SE) aims to improve speech quality and intelligibility, which are both
related to a smooth transition in speech segments that may carry linguistic information, eg …