Noise-robust voice conversion with domain adversarial training
Voice conversion has made great progress in the past few years under the studio-quality test
scenario in terms of speech quality and speaker similarity. However, in real applications, test …
scenario in terms of speech quality and speaker similarity. However, in real applications, test …
Deep feature cyclegans: Speaker identity preserving non-parallel microphone-telephone domain adaptation for speaker verification
With the increase in the availability of speech from varied domains, it is imperative to use
such out-of-domain data to improve existing speech systems. Domain adaptation is a …
such out-of-domain data to improve existing speech systems. Domain adaptation is a …
Sig-vc: A speaker information guided zero-shot voice conversion system for both human beings and machines
Nowadays, as more and more systems achieve good performance in traditional voice
conversion (VC) tasks, people's attention gradually turns to VC tasks under extreme …
conversion (VC) tasks, people's attention gradually turns to VC tasks under extreme …
Region normalized capsule network based generative adversarial network for non-parallel voice conversion
Voice conversion (VC) involves altering the vocal characteristics of a source speaker to
resemble those of a target speaker while maintaining the same linguistic content. Recently …
resemble those of a target speaker while maintaining the same linguistic content. Recently …
Voice conversion using feature specific loss function based self-attentive generative adversarial network
Voice conversion (VC) is the process of converting the vocal texture of a source speaker
similar to that of a target speaker without altering the content of the source speaker's speech …
similar to that of a target speaker without altering the content of the source speaker's speech …
HSVRS: A Virtual Reality System of the Hide-and-Seek Game to Enhance Gaze Fixation Ability for Autistic Children
C Yu, S Wang, D Zhang, Y Zhang… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
Numerous children diagnosed with Autism Spectrum Disorder (ASD) exhibit abnormal eye
gaze pattern in communication and social interaction. In this study, we aim to investigate the …
gaze pattern in communication and social interaction. In this study, we aim to investigate the …
Audio-visual speech synthesis using vision transformer–enhanced autoencoders with ensemble of loss functions
Audio-visual speech synthesis (AVSS) has garnered attention in recent years for its utility in
the realm of audio-visual learning. AVSS transforms one speaker's speech into another's …
the realm of audio-visual learning. AVSS transforms one speaker's speech into another's …
An analysis of performance evaluation metrics for voice conversion models
The process of transforming a source speaker's vocal style or vocal feature to that of a target
speaker while keeping the linguistic information of the source speaker unchanged is known …
speaker while keeping the linguistic information of the source speaker unchanged is known …
[PDF][PDF] Cross-Lingual Voice Conversion with a Cycle Consistency Loss on Linguistic Representation.
Abstract Cross-Lingual Voice Conversion (XVC) aims to modify a source speaker identity
towards a target while preserving the source linguistic content. This paper introduces a cycle …
towards a target while preserving the source linguistic content. This paper introduces a cycle …
FID-RPRGAN-VC: fréchet inception distance loss based region-wise position normalized relativistic GAN for non-parallel voice conversion
Voice conversion (VC) is the speech-to-speech (STS) synthesis process that converts the
vocal identity of a source speaker to a target speaker by keeping the linguistic content …
vocal identity of a source speaker to a target speaker by keeping the linguistic content …