HiFi-GAN: High-fidelity denoising and dereverberation based on speech deep features in adversaria...

Generative adversarial networks for speech processing: A review

A Wali, Z Alamgir, S Karim, A Fawaz, MB Ali… - Computer Speech & …, 2022 - Elsevier

Generative adversarial networks (GANs) have seen remarkable progress in recent years.
They are used as generative models for all kinds of data such as text, images, audio, music …

被引用次数：54 相关文章所有 2 个版本

[PDF] arxiv.org

Metricgan+: An improved version of metricgan for speech enhancement

SW Fu, C Yu, TA Hsieh, P Plantinga… - arXiv preprint arXiv …, 2021 - arxiv.org

The discrepancy between the cost function used for training a speech enhancement model
and human auditory perception usually makes the quality of enhanced speech …

被引用次数：207 相关文章所有 9 个版本

[PDF] springer.com

Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

P Ochieng - Artificial Intelligence Review, 2023 - Springer

Deep neural networks (DNN) techniques have become pervasive in domains such as
natural language processing and computer vision. They have achieved great success in …

被引用次数：14 相关文章所有 8 个版本

[PDF] isca-archive.org

[PDF][PDF] SE-Conformer: Time-Domain Speech Enhancement Using Conformer.

E Kim, H Seo - Interspeech, 2021 - isca-archive.org

Convolution-augmented transformer (conformer) has recently shown competitive results in
speech-domain applications, such as automatic speech recognition, continuous speech …

被引用次数：86 相关文章所有 4 个版本

[PDF] arxiv.org

FRCRN: Boosting feature representation using frequency recurrence for monaural speech enhancement

S Zhao, B Ma, KN Watcharasupat… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Convolutional recurrent networks (CRN) integrating a convolutional encoder-decoder (CED)
structure and a recurrent structure have achieved promising performance for monaural …

被引用次数：65 相关文章所有 3 个版本

[PDF] thecvf.com

Audio-visual speech codecs: Rethinking audio-visual speech enhancement by re-synthesis

K Yang, D Marković, S Krenn… - Proceedings of the …, 2022 - openaccess.thecvf.com

Since facial actions such as lip movements contain significant information about speech
content, it is not surprising that audio-visual speech enhancement methods are more …

被引用次数：36 相关文章所有 5 个版本

[PDF] thecvf.com

Av-rir: Audio-visual room impulse response estimation

A Ratnarajah, S Ghosh, S Kumar… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Accurate estimation of Room Impulse Response (RIR) which captures an
environment's acoustic properties is important for speech processing and AR/VR …

被引用次数：6 相关文章所有 4 个版本

[HTML] aip.org

[HTML][HTML] Generative models for sound field reconstruction

E Fernandez-Grande, X Karakonstantis… - The Journal of the …, 2023 - pubs.aip.org

This work examines the use of generative adversarial networks for reconstructing sound
fields from experimental data. It is investigated whether generative models, which learn the …

被引用次数：28 相关文章所有 6 个版本

[PDF] arxiv.org

CDPAM: Contrastive learning for perceptual audio similarity

P Manocha, Z Jin, R Zhang… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

Many speech processing methods based on deep learning require an automatic and
differentiable audio metric for the loss function. The DPAM approach of Manocha et al.[1] …

被引用次数：70 相关文章所有 5 个版本

[PDF] arxiv.org

Improving perceptual quality by phone-fortified perceptual loss using wasserstein distance for speech enhancement

TA Hsieh, C Yu, SW Fu, X Lu, Y Tsao - arXiv preprint arXiv:2010.15174, 2020 - arxiv.org

Speech enhancement (SE) aims to improve speech quality and intelligibility, which are both
related to a smooth transition in speech segments that may carry linguistic information, eg …

被引用次数：73 相关文章所有 10 个版本