A comparison of discrete and soft speech units for improved voice conversion

A Firc, K Malinka, P Hanáček - Heliyon, 2023 - cell.com

Deepfakes present an emerging threat in cyberspace. Recent developments in machine
learning make deepfakes highly believable, and very difficult to differentiate between what is …

被引用次数：28 相关文章所有 7 个版本

[PDF] thecvf.com

Multi-modal facial affective analysis based on masked autoencoder

W Zhang, B Ma, F Qiu, Y Ding - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Human affective behavior analysis focuses on analyzing human expressions or other
behaviors to enhance the understanding of human psychology. The CVPR 2023 …

被引用次数：27 相关文章所有 6 个版本

[PDF] arxiv.org

The singing voice conversion challenge 2023

WC Huang, LP Violeta, S Liu, J Shi… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

We present the latest iteration of the voice conversion challenge (VCC) series, a bi-annual
scientific event aiming to compare and understand different voice conversion (VC) systems …

被引用次数：53 相关文章所有 4 个版本

[PDF] frontiersin.org

Reimagining speech: a scoping review of deep learning-based methods for non-parallel voice conversion

AR Bargum, S Serafin, C Erkut - Frontiers in signal processing, 2024 - frontiersin.org

Research on deep learning-powered voice conversion (VC) in speech-to-speech scenarios
are gaining increasing popularity. Although many of the works in the field of voice …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Voice conversion with just nearest neighbors

M Baas, B van Niekerk, H Kamper - arXiv preprint arXiv:2305.18975, 2023 - arxiv.org

Any-to-any voice conversion aims to transform source speech into a target voice with just a
few examples of the target speaker as a reference. Recent methods produce convincing …

被引用次数：47 相关文章所有 6 个版本

[PDF] arxiv.org

WESPER: Zero-shot and realtime whisper to normal voice conversion for whisper-based speech interactions

J Rekimoto - Proceedings of the 2023 CHI Conference on Human …, 2023 - dl.acm.org

Recognizing whispered speech and converting it to normal speech creates many
possibilities for speech interaction. Because the sound pressure of whispered speech is …

被引用次数：22 相关文章所有 3 个版本

[PDF] arxiv.org

Self-supervised learning for speech enhancement through synthesis

B Irvin, M Stamenovic, M Kegler… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Modern speech enhancement (SE) networks typically implement noise suppression through
time-frequency masking, latent representation masking, or discriminative signal prediction …

被引用次数：20 相关文章所有 4 个版本

[PDF] arxiv.org

Towards General-Purpose Text-Instruction-Guided Voice Conversion

CY Kuan, CA Li, TY Hsu, TY Lin… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

This paper introduces a novel voice conversion (VC) model, guided by text instructions such
as “articulate slowly with a deep tone “or “speak in a cheerful boyish voice”. Unlike …

被引用次数：6 相关文章所有 3 个版本

[PDF] ieee.org

Speaker anonymization using orthogonal householder neural network

X Miao, X Wang, E Cooper, J Yamagishi… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org

Speaker anonymization aims to conceal a speaker's identity while preserving content
information in speech. Current mainstream neural-network speaker anonymization systems …

被引用次数：19 相关文章所有 4 个版本

[PDF] thecvf.com

An Effective Ensemble Learning Framework for Affective Behaviour Analysis

W Zhang, F Qiu, C Liu, L Li, H Du… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Affective Behavior Analysis aims to facilitate technology emotionally smart creating
a world where devices can understand and react to our emotions as humans do. To …

被引用次数：1 相关文章