Face-voice matching using cross-modal embeddings

TH Oh, T Dekel, C Kim, I Mosseri… - Proceedings of the …, 2019 - openaccess.thecvf.com

How much can we infer about a person's looks from the way they speak? In this paper, we
study the task of reconstructing a facial image of a person from a short audio recording of …

被引用次数：187 相关文章所有 10 个版本

[PDF] arxiv.org

Voice-face homogeneity tells deepfake

H Cheng, Y Guo, T Wang, Q Li, X Chang… - ACM Transactions on …, 2023 - dl.acm.org

Detecting forgery videos is highly desirable due to the abuse of deepfake. Existing detection
approaches contribute to exploring the specific artifacts in deepfake videos and fit well on …

被引用次数：51 相关文章所有 4 个版本

Audio-visual deep neural network for robust person verification

Y Qian, Z Chen, S Wang - IEEE/ACM Transactions on Audio …, 2021 - ieeexplore.ieee.org

Voice and face are two most popular biometrics for person verification, usually used in
speaker verification and face verification tasks. It has already been observed that simply …

被引用次数：53 相关文章所有 3 个版本

[PDF] dorienherremans.com

EmoMV: Affective music-video correspondence learning datasets for classification and retrieval

HTP Thao, G Roig, D Herremans - Information Fusion, 2023 - Elsevier

Studies in affective audio–visual correspondence learning require ground-truth data to train,
validate, and test models. The number of available datasets together with benchmarks …

被引用次数：13 相关文章所有 6 个版本

[PDF] arxiv.org

Recent advances and challenges in deep audio-visual correlation learning

L Vilaça, Y Yu, P Viana - arXiv preprint arXiv:2202.13673, 2022 - arxiv.org

Audio-visual correlation learning aims to capture essential correspondences and
understand natural phenomena between audio and video. With the rapid growth of deep …

被引用次数：7 相关文章所有 3 个版本

Disentangled representation learning for cross-modal biometric matching

H Ning, X Zheng, X Lu, Y Yuan - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

Cross-modal biometric matching (CMBM) aims to determine the corresponding voice from a
face, or identify the corresponding face from a voice. Recently, many CMBM methods have …

被引用次数：32 相关文章所有 3 个版本

[PDF] thecvf.com

Seeking the shape of sound: An adaptive framework for learning voice-face association

P Wen, Q Xu, Y Jiang, Z Yang, Y He… - Proceedings of the …, 2021 - openaccess.thecvf.com

Nowadays, we have witnessed the early progress on learning the association between
voice and face automatically, which brings a new wave of studies to the computer vision …

被引用次数：29 相关文章所有 6 个版本

[PDF] arxiv.org

Audio-visual speaker recognition with a cross-modal discriminative network

R Tao, RK Das, H Li - arXiv preprint arXiv:2008.03894, 2020 - arxiv.org

Audio-visual speaker recognition is one of the tasks in the recent 2019 NIST speaker
recognition evaluation (SRE). Studies in neuroscience and computer science all point to the …

被引用次数：37 相关文章所有 9 个版本

[PDF] arxiv.org

Noise-tolerant audio-visual online person verification using an attention-based neural network fusion

S Shon, TH Oh, J Glass - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org

In this paper, we present a multi-modal online person verification system using both speech
and visual signals. Inspired by neuroscientific findings on the association of voice and face …

被引用次数：53 相关文章所有 8 个版本

[PDF] thecvf.com

Cross-modal speaker verification and recognition: A multilingual perspective

S Nawaz, MS Saeed, P Morerio… - Proceedings of the …, 2021 - openaccess.thecvf.com

Recent years have seen a surge in finding association between faces and voices within a
cross-modal biometric application along with speaker recognition. Inspired from this, we …

被引用次数：23 相关文章所有 9 个版本