Vqmivc: Vector quantization and mutual information-based unsupervised speech representation disentanglement for one-shot voice conversion

D Wang, L Deng, YT Yeung, X Chen, X Liu… - arXiv preprint arXiv …, 2021 - arxiv.org
One-shot voice conversion (VC), which performs conversion across arbitrary speakers with
only a single target-speaker utterance for reference, can be effectively achieved by speech …

Glottal closure and opening instant detection from speech signals

T Drugman, T Dutoit - arXiv preprint arXiv:2001.00841, 2019 - arxiv.org
This paper proposes a new procedure to detect Glottal Closure and Opening Instants (GCIs
and GOIs) directly from speech waveforms. The procedure is divided into two successive …

Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification

T Liu, KA Lee, Q Wang, H Li - IEEE/ACM Transactions on Audio …, 2024 - ieeexplore.ieee.org
The residual neural networks (ResNet) demonstrate the impressive performance in
automatic speaker verification (ASV). They treat the time and frequency dimensions equally …

End-to-end voice conversion via cross-modal knowledge distillation for dysarthric speech reconstruction

D Wang, J Yu, X Wu, S Liu, L Sun… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
Dysarthric speech reconstruction (DSR) is a challenging task due to difficulties in repairing
unstable prosody and correcting imprecise articulation. Inspired by the success of sequence …

Generative adversarial networks for unpaired voice transformation on impaired speech

LW Chen, HY Lee, Y Tsao - arXiv preprint arXiv:1810.12656, 2018 - arxiv.org
This paper focuses on using voice conversion (VC) to improve the speech intelligibility of
surgical patients who have had parts of their articulators removed. Due to the difficulty of …

A fast method for high-resolution voiced/unvoiced detection and glottal closure/opening instant estimation of speech

AI Koutrouvelis, GP Kafentzis… - … on Audio, Speech …, 2015 - ieeexplore.ieee.org
We propose a fast speech analysis method which simultaneously performs high-resolution
voiced/unvoiced detection (VUD) and accurate estimation of glottal closure and glottal …

Rhythm-flexible voice conversion without parallel data using cycle-gan over phoneme posteriorgram sequences

C Yeh, P Hsu, J Chou, H Lee… - 2018 IEEE Spoken …, 2018 - ieeexplore.ieee.org
Speaking rate refers to the average number of phonemes within some unit time, while the
rhythmic patterns refer to duration distributions for realizations of different phonemes within …

[PDF][PDF] Voice conversion: A critical survey

AF Machado, MG Queiroz - Proceedings, 2010 - repositorio.usp.br
Voice conversion is an emergent problem in voice and speech processing with increasing
commercial interest, due to applications such as Speech-to-Speech Translation (SST) and …

[PDF][PDF] FlowCPCVC: A Contrastive Predictive Coding Supervised Flow Framework for Any-to-Any Voice Conversion.

J Huang, W Xu, Y Li, J Liu, D Ma, W Xiang - Interspeech, 2022 - isca-archive.org
Recently, the research of any-to-any voice conversion (VC) has been developed rapidly.
However, they often suffer from unsatisfactory quality and require two stages for training, in …

Two-stage and self-supervised voice conversion for zero-shot dysarthric speech reconstruction

D Liu, Y Lin, H Bu, M Li - 2024 International Conference on …, 2024 - ieeexplore.ieee.org
Dysarthria is a motor speech disorder commonly associated with conditions such as
cerebral palsy, Parkinson's disease, amyotrophic lateral sclerosis, and stroke. Individuals …