Mixture of inference networks for VAE-based audio-visual speech enhancement

D Michelsanti, ZH Tan, SX Zhang, Y Xu… - … on Audio, Speech …, 2021 - ieeexplore.ieee.org

Speech enhancement and speech separation are two related tasks, whose purpose is to
extract either one or more target speech signals, respectively, from a mixture of sounds …

被引用次数：263 相关文章所有 6 个版本

[PDF] arxiv.org

Vovit: Low latency graph-based audio-visual voice separation transformer

JF Montesinos, VS Kadandale, G Haro - European Conference on …, 2022 - Springer

This paper presents an audio-visual approach for voice separation which produces state-of-
the-art results at a low latency in two scenarios: speech and singing voice. The model is …

被引用次数：21 相关文章所有 8 个版本

[PDF] arxiv.org

Srtnet: Time domain speech enhancement via stochastic refinement

Z Qiu, M Fu, Y Yu, LL Yin, F Sun… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Diffusion model, as a new generative model which is very popular in image generation and
audio synthesis, is rarely used in speech enhancement. In this paper, we use the diffusion …

被引用次数：14 相关文章所有 5 个版本

[PDF] arxiv.org

Se-bridge: Speech enhancement with consistent brownian bridge

Z Qiu, M Fu, F Sun, G Altenbek, H Huang - arXiv preprint arXiv:2305.13796, 2023 - arxiv.org

We propose SE-Bridge, a novel method for speech enhancement (SE). After recently
applying the diffusion models to speech enhancement, we can achieve speech …

被引用次数：4 相关文章所有 2 个版本

[PDF] archive.org

Audio–text retrieval based on contrastive learning and collaborative attention mechanism

T Hu, X Xiang, J Qin, Y Tan - Multimedia Systems, 2023 - Springer

Existing research on audio–text retrieval is limited by the size of the dataset and the structure
of the network, making it difficult to learn the ideal features of audio and text resulting in low …

被引用次数：6 相关文章所有 5 个版本

[PDF] arxiv.org

Audio-visual speech enhancement with a deep kalman filter generative model

A Golmakani, M Sadeghi… - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org

Deep latent variable generative models based on variational autoencoder (VAE) have
shown promising performance for audio-visual speech enhancement (AVSE). The …

被引用次数：3 相关文章所有 9 个版本

Driver identification using deep generative model with limited data

H Hu, J Liu, G Chen, Y Zhao, Z Gao… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

The scarcity of driving data constrains the accuracy of deep learning (DL)-based driver
identification methods in practical application scenarios. To address this issue, this study …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

Switching variational auto-encoders for noise-agnostic audio-visual speech enhancement

M Sadeghi, X Alameda-Pineda - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org

Recently, audio-visual speech enhancement has been tackled in the unsupervised settings
based on variational auto-encoders (VAEs), where during training only clean data is used to …

被引用次数：9 相关文章所有 26 个版本

Public-private Attributes-based Variational Adversarial Network for Audio-Visual Cross-Modal Matching

A Zheng, F Yuan, H Zhang, J Wang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Existing audio-visual cross-modal matching methods focus on mitigating cross-modal
heterogeneity but ignore the impact of intra-class discrepancy of the same identity in …

[PDF] arxiv.org

Deep variational generative models for audio-visual speech separation

VN Nguyen, M Sadeghi, E Ricci… - 2021 IEEE 31st …, 2021 - ieeexplore.ieee.org

In this paper, we are interested in audio-visual speech separation given a single-channel
audio recording as well as visual information (lips movements) associated with each …

被引用次数：10 相关文章所有 6 个版本