Speech2face: Learning the face behind a voice

TH Oh, T Dekel, C Kim, I Mosseri… - Proceedings of the …, 2019 - openaccess.thecvf.com
How much can we infer about a person's looks from the way they speak? In this paper, we
study the task of reconstructing a facial image of a person from a short audio recording of …

Voice-face homogeneity tells deepfake

H Cheng, Y Guo, T Wang, Q Li, X Chang… - ACM Transactions on …, 2023 - dl.acm.org
Detecting forgery videos is highly desirable due to the abuse of deepfake. Existing detection
approaches contribute to exploring the specific artifacts in deepfake videos and fit well on …

Audio-visual deep neural network for robust person verification

Y Qian, Z Chen, S Wang - IEEE/ACM Transactions on Audio …, 2021 - ieeexplore.ieee.org
Voice and face are two most popular biometrics for person verification, usually used in
speaker verification and face verification tasks. It has already been observed that simply …

EmoMV: Affective music-video correspondence learning datasets for classification and retrieval

HTP Thao, G Roig, D Herremans - Information Fusion, 2023 - Elsevier
Studies in affective audio–visual correspondence learning require ground-truth data to train,
validate, and test models. The number of available datasets together with benchmarks …

Recent advances and challenges in deep audio-visual correlation learning

L Vilaça, Y Yu, P Viana - arXiv preprint arXiv:2202.13673, 2022 - arxiv.org
Audio-visual correlation learning aims to capture essential correspondences and
understand natural phenomena between audio and video. With the rapid growth of deep …

Disentangled representation learning for cross-modal biometric matching

H Ning, X Zheng, X Lu, Y Yuan - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Cross-modal biometric matching (CMBM) aims to determine the corresponding voice from a
face, or identify the corresponding face from a voice. Recently, many CMBM methods have …

Seeking the shape of sound: An adaptive framework for learning voice-face association

P Wen, Q Xu, Y Jiang, Z Yang, Y He… - Proceedings of the …, 2021 - openaccess.thecvf.com
Nowadays, we have witnessed the early progress on learning the association between
voice and face automatically, which brings a new wave of studies to the computer vision …

Audio-visual speaker recognition with a cross-modal discriminative network

R Tao, RK Das, H Li - arXiv preprint arXiv:2008.03894, 2020 - arxiv.org
Audio-visual speaker recognition is one of the tasks in the recent 2019 NIST speaker
recognition evaluation (SRE). Studies in neuroscience and computer science all point to the …

Noise-tolerant audio-visual online person verification using an attention-based neural network fusion

S Shon, TH Oh, J Glass - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org
In this paper, we present a multi-modal online person verification system using both speech
and visual signals. Inspired by neuroscientific findings on the association of voice and face …

Cross-modal speaker verification and recognition: A multilingual perspective

S Nawaz, MS Saeed, P Morerio… - Proceedings of the …, 2021 - openaccess.thecvf.com
Recent years have seen a surge in finding association between faces and voices within a
cross-modal biometric application along with speaker recognition. Inspired from this, we …