An overview of deep-learning-based audio-visual speech enhancement and separation
Speech enhancement and speech separation are two related tasks, whose purpose is to
extract either one or more target speech signals, respectively, from a mixture of sounds …
extract either one or more target speech signals, respectively, from a mixture of sounds …
Learning to separate object sounds by watching unlabeled video
Perceiving a scene most fully requires all the senses. Yet modeling how objects look and
sound is challenging: most natural scenes and events contain multiple objects, and the …
sound is challenging: most natural scenes and events contain multiple objects, and the …
Audio-visual speech enhancement using multimodal deep convolutional neural networks
Speech enhancement (SE) aims to reduce noise in speech signals. Most SE techniques
focus only on addressing audio information. In this paper, inspired by multimodal learning …
focus only on addressing audio information. In this paper, inspired by multimodal learning …
Audiovisual speech source separation: An overview of key methodologies
The separation of speech signals measured at multiple microphones in noisy and
reverberant environments using only the audio modality has limitations because there is …
reverberant environments using only the audio modality has limitations because there is …
Audiovisual fusion: Challenges and new approaches
AK Katsaggelos, S Bahaadini… - Proceedings of the …, 2015 - ieeexplore.ieee.org
In this paper, we review recent results on audiovisual (AV) fusion. We also discuss some of
the challenges and report on approaches to address them. One important issue in AV fusion …
the challenges and report on approaches to address them. One important issue in AV fusion …
Audio-visual speaker diarization based on spatiotemporal bayesian fusion
Speaker diarization consists of assigning speech signals to people engaged in a dialogue.
An audio-visual spatiotemporal diarization model is proposed. The model is well suited for …
An audio-visual spatiotemporal diarization model is proposed. The model is well suited for …
Audio–visual deep clustering for speech separation
Speech separation aims to separate individual voices from an audio mixture of multiple
simultaneous talkers. Audio-only approaches show unsatisfactory performance when the …
simultaneous talkers. Audio-only approaches show unsatisfactory performance when the …
Dynamic key-updating: Privacy-preserving authentication for RFID systems
The objective of private authentication for Radio Frequency Identification (RFID) systems is
to allow valid readers to explicitly authenticate their dominated tags without leaking the …
to allow valid readers to explicitly authenticate their dominated tags without leaking the …
Listen and look: Audio–visual matching assisted speech source separation
Source permutation, ie, assigning separated signal snippets to wrong sources over time, is a
major issue in the state-of-the-art speaker-independent speech source separation methods …
major issue in the state-of-the-art speaker-independent speech source separation methods …
Blind audiovisual source separation based on sparse redundant representations
AL Casanovas, G Monaci… - IEEE Transactions …, 2010 - ieeexplore.ieee.org
In this paper, we propose a novel method which is able to detect and separate audiovisual
sources present in a scene. Our method exploits the correlation between the video signal …
sources present in a scene. Our method exploits the correlation between the video signal …