An overview of deep-learning-based audio-visual speech enhancement and separation

D Michelsanti, ZH Tan, SX Zhang, Y Xu… - … on Audio, Speech …, 2021 - ieeexplore.ieee.org
Speech enhancement and speech separation are two related tasks, whose purpose is to
extract either one or more target speech signals, respectively, from a mixture of sounds …

Learning to separate object sounds by watching unlabeled video

R Gao, R Feris, K Grauman - Proceedings of the European …, 2018 - openaccess.thecvf.com
Perceiving a scene most fully requires all the senses. Yet modeling how objects look and
sound is challenging: most natural scenes and events contain multiple objects, and the …

Audio-visual speech enhancement using multimodal deep convolutional neural networks

JC Hou, SS Wang, YH Lai, Y Tsao… - … on Emerging Topics …, 2018 - ieeexplore.ieee.org
Speech enhancement (SE) aims to reduce noise in speech signals. Most SE techniques
focus only on addressing audio information. In this paper, inspired by multimodal learning …

Audiovisual speech source separation: An overview of key methodologies

B Rivet, W Wang, SM Naqvi… - IEEE Signal Processing …, 2014 - ieeexplore.ieee.org
The separation of speech signals measured at multiple microphones in noisy and
reverberant environments using only the audio modality has limitations because there is …

Audiovisual fusion: Challenges and new approaches

AK Katsaggelos, S Bahaadini… - Proceedings of the …, 2015 - ieeexplore.ieee.org
In this paper, we review recent results on audiovisual (AV) fusion. We also discuss some of
the challenges and report on approaches to address them. One important issue in AV fusion …

Audio-visual speaker diarization based on spatiotemporal bayesian fusion

ID Gebru, S Ba, X Li, R Horaud - IEEE transactions on pattern …, 2017 - ieeexplore.ieee.org
Speaker diarization consists of assigning speech signals to people engaged in a dialogue.
An audio-visual spatiotemporal diarization model is proposed. The model is well suited for …

Audio–visual deep clustering for speech separation

R Lu, Z Duan, C Zhang - IEEE/ACM Transactions on Audio …, 2019 - ieeexplore.ieee.org
Speech separation aims to separate individual voices from an audio mixture of multiple
simultaneous talkers. Audio-only approaches show unsatisfactory performance when the …

Dynamic key-updating: Privacy-preserving authentication for RFID systems

L Lu, J Han, L Hu, LM Ni - International Journal of …, 2012 - journals.sagepub.com
The objective of private authentication for Radio Frequency Identification (RFID) systems is
to allow valid readers to explicitly authenticate their dominated tags without leaking the …

Listen and look: Audio–visual matching assisted speech source separation

R Lu, Z Duan, C Zhang - IEEE Signal Processing Letters, 2018 - ieeexplore.ieee.org
Source permutation, ie, assigning separated signal snippets to wrong sources over time, is a
major issue in the state-of-the-art speaker-independent speech source separation methods …

Blind audiovisual source separation based on sparse redundant representations

AL Casanovas, G Monaci… - IEEE Transactions …, 2010 - ieeexplore.ieee.org
In this paper, we propose a novel method which is able to detect and separate audiovisual
sources present in a scene. Our method exploits the correlation between the video signal …