Noise-robust voice conversion with domain adversarial training

H Du, L Xie, H Li - Neural Networks, 2022 - Elsevier
Voice conversion has made great progress in the past few years under the studio-quality test
scenario in terms of speech quality and speaker similarity. However, in real applications, test …

Deep feature cyclegans: Speaker identity preserving non-parallel microphone-telephone domain adaptation for speaker verification

S Kataria, J Villalba, P Żelasko… - arXiv preprint arXiv …, 2021 - arxiv.org
With the increase in the availability of speech from varied domains, it is imperative to use
such out-of-domain data to improve existing speech systems. Domain adaptation is a …

Sig-vc: A speaker information guided zero-shot voice conversion system for both human beings and machines

H Zhang, Z Cai, X Qin, M Li - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Nowadays, as more and more systems achieve good performance in traditional voice
conversion (VC) tasks, people's attention gradually turns to VC tasks under extreme …

Region normalized capsule network based generative adversarial network for non-parallel voice conversion

MT Akhter, P Banerjee, S Dhar, S Ghosh… - … Conference on Speech …, 2023 - Springer
Voice conversion (VC) involves altering the vocal characteristics of a source speaker to
resemble those of a target speaker while maintaining the same linguistic content. Recently …

Voice conversion using feature specific loss function based self-attentive generative adversarial network

S Dhar, P Banerjee, ND Jana… - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Voice conversion (VC) is the process of converting the vocal texture of a source speaker
similar to that of a target speaker without altering the content of the source speaker's speech …

HSVRS: A Virtual Reality System of the Hide-and-Seek Game to Enhance Gaze Fixation Ability for Autistic Children

C Yu, S Wang, D Zhang, Y Zhang… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
Numerous children diagnosed with Autism Spectrum Disorder (ASD) exhibit abnormal eye
gaze pattern in communication and social interaction. In this study, we aim to investigate the …

Audio-visual speech synthesis using vision transformer–enhanced autoencoders with ensemble of loss functions

S Ghosh, S Sarkar, S Ghosh, F Zalkow, ND Jana - Applied Intelligence, 2024 - Springer
Audio-visual speech synthesis (AVSS) has garnered attention in recent years for its utility in
the realm of audio-visual learning. AVSS transforms one speaker's speech into another's …

An analysis of performance evaluation metrics for voice conversion models

MT Akhter, P Banerjee, S Dhar… - 2022 IEEE 19th India …, 2022 - ieeexplore.ieee.org
The process of transforming a source speaker's vocal style or vocal feature to that of a target
speaker while keeping the linguistic information of the source speaker unchanged is known …

[PDF][PDF] Cross-Lingual Voice Conversion with a Cycle Consistency Loss on Linguistic Representation.

Y Zhou, X Tian, Z Wu, H Li - Interspeech, 2021 - isca-archive.org
Abstract Cross-Lingual Voice Conversion (XVC) aims to modify a source speaker identity
towards a target while preserving the source linguistic content. This paper introduces a cycle …

FID-RPRGAN-VC: fréchet inception distance loss based region-wise position normalized relativistic GAN for non-parallel voice conversion

S Dhar, MDT Akhter, P Banerjee… - 2023 Asia Pacific …, 2023 - ieeexplore.ieee.org
Voice conversion (VC) is the speech-to-speech (STS) synthesis process that converts the
vocal identity of a source speaker to a target speaker by keeping the linguistic content …