Deepfakes as a threat to a speaker and facial recognition: An overview of tools and attack vectors
Deepfakes present an emerging threat in cyberspace. Recent developments in machine
learning make deepfakes highly believable, and very difficult to differentiate between what is …
learning make deepfakes highly believable, and very difficult to differentiate between what is …
Multi-modal facial affective analysis based on masked autoencoder
Human affective behavior analysis focuses on analyzing human expressions or other
behaviors to enhance the understanding of human psychology. The CVPR 2023 …
behaviors to enhance the understanding of human psychology. The CVPR 2023 …
The singing voice conversion challenge 2023
We present the latest iteration of the voice conversion challenge (VCC) series, a bi-annual
scientific event aiming to compare and understand different voice conversion (VC) systems …
scientific event aiming to compare and understand different voice conversion (VC) systems …
Reimagining speech: a scoping review of deep learning-based methods for non-parallel voice conversion
Research on deep learning-powered voice conversion (VC) in speech-to-speech scenarios
are gaining increasing popularity. Although many of the works in the field of voice …
are gaining increasing popularity. Although many of the works in the field of voice …
Voice conversion with just nearest neighbors
Any-to-any voice conversion aims to transform source speech into a target voice with just a
few examples of the target speaker as a reference. Recent methods produce convincing …
few examples of the target speaker as a reference. Recent methods produce convincing …
WESPER: Zero-shot and realtime whisper to normal voice conversion for whisper-based speech interactions
J Rekimoto - Proceedings of the 2023 CHI Conference on Human …, 2023 - dl.acm.org
Recognizing whispered speech and converting it to normal speech creates many
possibilities for speech interaction. Because the sound pressure of whispered speech is …
possibilities for speech interaction. Because the sound pressure of whispered speech is …
Self-supervised learning for speech enhancement through synthesis
Modern speech enhancement (SE) networks typically implement noise suppression through
time-frequency masking, latent representation masking, or discriminative signal prediction …
time-frequency masking, latent representation masking, or discriminative signal prediction …
Towards General-Purpose Text-Instruction-Guided Voice Conversion
This paper introduces a novel voice conversion (VC) model, guided by text instructions such
as “articulate slowly with a deep tone “or “speak in a cheerful boyish voice”. Unlike …
as “articulate slowly with a deep tone “or “speak in a cheerful boyish voice”. Unlike …
Speaker anonymization using orthogonal householder neural network
Speaker anonymization aims to conceal a speaker's identity while preserving content
information in speech. Current mainstream neural-network speaker anonymization systems …
information in speech. Current mainstream neural-network speaker anonymization systems …
An Effective Ensemble Learning Framework for Affective Behaviour Analysis
Abstract Affective Behavior Analysis aims to facilitate technology emotionally smart creating
a world where devices can understand and react to our emotions as humans do. To …
a world where devices can understand and react to our emotions as humans do. To …