Deepfakes as a threat to a speaker and facial recognition: An overview of tools and attack vectors

A Firc, K Malinka, P Hanáček - Heliyon, 2023 - cell.com
Deepfakes present an emerging threat in cyberspace. Recent developments in machine
learning make deepfakes highly believable, and very difficult to differentiate between what is …

Multi-modal facial affective analysis based on masked autoencoder

W Zhang, B Ma, F Qiu, Y Ding - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Human affective behavior analysis focuses on analyzing human expressions or other
behaviors to enhance the understanding of human psychology. The CVPR 2023 …

The singing voice conversion challenge 2023

WC Huang, LP Violeta, S Liu, J Shi… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
We present the latest iteration of the voice conversion challenge (VCC) series, a bi-annual
scientific event aiming to compare and understand different voice conversion (VC) systems …

Reimagining speech: a scoping review of deep learning-based methods for non-parallel voice conversion

AR Bargum, S Serafin, C Erkut - Frontiers in signal processing, 2024 - frontiersin.org
Research on deep learning-powered voice conversion (VC) in speech-to-speech scenarios
are gaining increasing popularity. Although many of the works in the field of voice …

Voice conversion with just nearest neighbors

M Baas, B van Niekerk, H Kamper - arXiv preprint arXiv:2305.18975, 2023 - arxiv.org
Any-to-any voice conversion aims to transform source speech into a target voice with just a
few examples of the target speaker as a reference. Recent methods produce convincing …

WESPER: Zero-shot and realtime whisper to normal voice conversion for whisper-based speech interactions

J Rekimoto - Proceedings of the 2023 CHI Conference on Human …, 2023 - dl.acm.org
Recognizing whispered speech and converting it to normal speech creates many
possibilities for speech interaction. Because the sound pressure of whispered speech is …

Self-supervised learning for speech enhancement through synthesis

B Irvin, M Stamenovic, M Kegler… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Modern speech enhancement (SE) networks typically implement noise suppression through
time-frequency masking, latent representation masking, or discriminative signal prediction …

Towards General-Purpose Text-Instruction-Guided Voice Conversion

CY Kuan, CA Li, TY Hsu, TY Lin… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
This paper introduces a novel voice conversion (VC) model, guided by text instructions such
as “articulate slowly with a deep tone “or “speak in a cheerful boyish voice”. Unlike …

Speaker anonymization using orthogonal householder neural network

X Miao, X Wang, E Cooper, J Yamagishi… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
Speaker anonymization aims to conceal a speaker's identity while preserving content
information in speech. Current mainstream neural-network speaker anonymization systems …

An Effective Ensemble Learning Framework for Affective Behaviour Analysis

W Zhang, F Qiu, C Liu, L Li, H Du… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Affective Behavior Analysis aims to facilitate technology emotionally smart creating
a world where devices can understand and react to our emotions as humans do. To …