Generative adversarial networks for speech processing: A review
Generative adversarial networks (GANs) have seen remarkable progress in recent years.
They are used as generative models for all kinds of data such as text, images, audio, music …
They are used as generative models for all kinds of data such as text, images, audio, music …
Metricgan+: An improved version of metricgan for speech enhancement
The discrepancy between the cost function used for training a speech enhancement model
and human auditory perception usually makes the quality of enhanced speech …
and human auditory perception usually makes the quality of enhanced speech …
Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis
P Ochieng - Artificial Intelligence Review, 2023 - Springer
Deep neural networks (DNN) techniques have become pervasive in domains such as
natural language processing and computer vision. They have achieved great success in …
natural language processing and computer vision. They have achieved great success in …
[PDF][PDF] SE-Conformer: Time-Domain Speech Enhancement Using Conformer.
E Kim, H Seo - Interspeech, 2021 - isca-archive.org
Convolution-augmented transformer (conformer) has recently shown competitive results in
speech-domain applications, such as automatic speech recognition, continuous speech …
speech-domain applications, such as automatic speech recognition, continuous speech …
FRCRN: Boosting feature representation using frequency recurrence for monaural speech enhancement
S Zhao, B Ma, KN Watcharasupat… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Convolutional recurrent networks (CRN) integrating a convolutional encoder-decoder (CED)
structure and a recurrent structure have achieved promising performance for monaural …
structure and a recurrent structure have achieved promising performance for monaural …
Audio-visual speech codecs: Rethinking audio-visual speech enhancement by re-synthesis
K Yang, D Marković, S Krenn… - Proceedings of the …, 2022 - openaccess.thecvf.com
Since facial actions such as lip movements contain significant information about speech
content, it is not surprising that audio-visual speech enhancement methods are more …
content, it is not surprising that audio-visual speech enhancement methods are more …
Av-rir: Audio-visual room impulse response estimation
Abstract Accurate estimation of Room Impulse Response (RIR) which captures an
environment's acoustic properties is important for speech processing and AR/VR …
environment's acoustic properties is important for speech processing and AR/VR …
[HTML][HTML] Generative models for sound field reconstruction
E Fernandez-Grande, X Karakonstantis… - The Journal of the …, 2023 - pubs.aip.org
This work examines the use of generative adversarial networks for reconstructing sound
fields from experimental data. It is investigated whether generative models, which learn the …
fields from experimental data. It is investigated whether generative models, which learn the …
CDPAM: Contrastive learning for perceptual audio similarity
Many speech processing methods based on deep learning require an automatic and
differentiable audio metric for the loss function. The DPAM approach of Manocha et al.[1] …
differentiable audio metric for the loss function. The DPAM approach of Manocha et al.[1] …
Improving perceptual quality by phone-fortified perceptual loss using wasserstein distance for speech enhancement
Speech enhancement (SE) aims to improve speech quality and intelligibility, which are both
related to a smooth transition in speech segments that may carry linguistic information, eg …
related to a smooth transition in speech segments that may carry linguistic information, eg …