Vqmivc: Vector quantization and mutual information-based unsupervised speech representation disentanglement for one-shot voice conversion
One-shot voice conversion (VC), which performs conversion across arbitrary speakers with
only a single target-speaker utterance for reference, can be effectively achieved by speech …
only a single target-speaker utterance for reference, can be effectively achieved by speech …
Glottal closure and opening instant detection from speech signals
This paper proposes a new procedure to detect Glottal Closure and Opening Instants (GCIs
and GOIs) directly from speech waveforms. The procedure is divided into two successive …
and GOIs) directly from speech waveforms. The procedure is divided into two successive …
Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification
The residual neural networks (ResNet) demonstrate the impressive performance in
automatic speaker verification (ASV). They treat the time and frequency dimensions equally …
automatic speaker verification (ASV). They treat the time and frequency dimensions equally …
End-to-end voice conversion via cross-modal knowledge distillation for dysarthric speech reconstruction
Dysarthric speech reconstruction (DSR) is a challenging task due to difficulties in repairing
unstable prosody and correcting imprecise articulation. Inspired by the success of sequence …
unstable prosody and correcting imprecise articulation. Inspired by the success of sequence …
Generative adversarial networks for unpaired voice transformation on impaired speech
This paper focuses on using voice conversion (VC) to improve the speech intelligibility of
surgical patients who have had parts of their articulators removed. Due to the difficulty of …
surgical patients who have had parts of their articulators removed. Due to the difficulty of …
A fast method for high-resolution voiced/unvoiced detection and glottal closure/opening instant estimation of speech
AI Koutrouvelis, GP Kafentzis… - … on Audio, Speech …, 2015 - ieeexplore.ieee.org
We propose a fast speech analysis method which simultaneously performs high-resolution
voiced/unvoiced detection (VUD) and accurate estimation of glottal closure and glottal …
voiced/unvoiced detection (VUD) and accurate estimation of glottal closure and glottal …
Rhythm-flexible voice conversion without parallel data using cycle-gan over phoneme posteriorgram sequences
Speaking rate refers to the average number of phonemes within some unit time, while the
rhythmic patterns refer to duration distributions for realizations of different phonemes within …
rhythmic patterns refer to duration distributions for realizations of different phonemes within …
[PDF][PDF] Voice conversion: A critical survey
AF Machado, MG Queiroz - Proceedings, 2010 - repositorio.usp.br
Voice conversion is an emergent problem in voice and speech processing with increasing
commercial interest, due to applications such as Speech-to-Speech Translation (SST) and …
commercial interest, due to applications such as Speech-to-Speech Translation (SST) and …
[PDF][PDF] FlowCPCVC: A Contrastive Predictive Coding Supervised Flow Framework for Any-to-Any Voice Conversion.
J Huang, W Xu, Y Li, J Liu, D Ma, W Xiang - Interspeech, 2022 - isca-archive.org
Recently, the research of any-to-any voice conversion (VC) has been developed rapidly.
However, they often suffer from unsatisfactory quality and require two stages for training, in …
However, they often suffer from unsatisfactory quality and require two stages for training, in …
Two-stage and self-supervised voice conversion for zero-shot dysarthric speech reconstruction
Dysarthria is a motor speech disorder commonly associated with conditions such as
cerebral palsy, Parkinson's disease, amyotrophic lateral sclerosis, and stroke. Individuals …
cerebral palsy, Parkinson's disease, amyotrophic lateral sclerosis, and stroke. Individuals …