Transformation of speaker characteristics for voice conversion

Vqmivc: Vector quantization and mutual information-based unsupervised speech representation disentanglement for one-shot voice conversion

D Wang, L Deng, YT Yeung, X Chen, X Liu… - arXiv preprint arXiv …, 2021 - arxiv.org

One-shot voice conversion (VC), which performs conversion across arbitrary speakers with
only a single target-speaker utterance for reference, can be effectively achieved by speech …

被引用次数：166 相关文章所有 8 个版本

[PDF] arxiv.org

Glottal closure and opening instant detection from speech signals

T Drugman, T Dutoit - arXiv preprint arXiv:2001.00841, 2019 - arxiv.org

This paper proposes a new procedure to detect Glottal Closure and Opening Instants (GCIs
and GOIs) directly from speech waveforms. The procedure is divided into two successive …

被引用次数：208 相关文章所有 9 个版本

[PDF] ieee.org

Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification

T Liu, KA Lee, Q Wang, H Li - IEEE/ACM Transactions on Audio …, 2024 - ieeexplore.ieee.org

The residual neural networks (ResNet) demonstrate the impressive performance in
automatic speaker verification (ASV). They treat the time and frequency dimensions equally …

被引用次数：13 相关文章所有 4 个版本

[PDF] cuhk.edu.hk

End-to-end voice conversion via cross-modal knowledge distillation for dysarthric speech reconstruction

D Wang, J Yu, X Wu, S Liu, L Sun… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

Dysarthric speech reconstruction (DSR) is a challenging task due to difficulties in repairing
unstable prosody and correcting imprecise articulation. Inspired by the success of sequence …

被引用次数：47 相关文章所有 4 个版本

[PDF] arxiv.org

Generative adversarial networks for unpaired voice transformation on impaired speech

LW Chen, HY Lee, Y Tsao - arXiv preprint arXiv:1810.12656, 2018 - arxiv.org

This paper focuses on using voice conversion (VC) to improve the speech intelligibility of
surgical patients who have had parts of their articulators removed. Due to the difficulty of …

被引用次数：36 相关文章所有 8 个版本

[PDF] uoc.gr

A fast method for high-resolution voiced/unvoiced detection and glottal closure/opening instant estimation of speech

AI Koutrouvelis, GP Kafentzis… - … on Audio, Speech …, 2015 - ieeexplore.ieee.org

We propose a fast speech analysis method which simultaneously performs high-resolution
voiced/unvoiced detection (VUD) and accurate estimation of glottal closure and glottal …

被引用次数：43 相关文章所有 9 个版本

[PDF] arxiv.org

Rhythm-flexible voice conversion without parallel data using cycle-gan over phoneme posteriorgram sequences

C Yeh, P Hsu, J Chou, H Lee… - 2018 IEEE Spoken …, 2018 - ieeexplore.ieee.org

Speaking rate refers to the average number of phonemes within some unit time, while the
rhythmic patterns refer to duration distributions for realizations of different phonemes within …

被引用次数：31 相关文章所有 6 个版本

[PDF] usp.br

[PDF][PDF] Voice conversion: A critical survey

AF Machado, MG Queiroz - Proceedings, 2010 - repositorio.usp.br

Voice conversion is an emergent problem in voice and speech processing with increasing
commercial interest, due to applications such as Speech-to-Speech Translation (SST) and …

被引用次数：52 相关文章所有 5 个版本

[PDF] isca-archive.org

[PDF][PDF] FlowCPCVC: A Contrastive Predictive Coding Supervised Flow Framework for Any-to-Any Voice Conversion.

J Huang, W Xu, Y Li, J Liu, D Ma, W Xiang - Interspeech, 2022 - isca-archive.org

Recently, the research of any-to-any voice conversion (VC) has been developed rapidly.
However, they often suffer from unsatisfactory quality and require two stages for training, in …

被引用次数：6 相关文章所有 4 个版本

[PDF] duke.edu

Two-stage and self-supervised voice conversion for zero-shot dysarthric speech reconstruction

D Liu, Y Lin, H Bu, M Li - 2024 International Conference on …, 2024 - ieeexplore.ieee.org

Dysarthria is a motor speech disorder commonly associated with conditions such as
cerebral palsy, Parkinson's disease, amyotrophic lateral sclerosis, and stroke. Individuals …

被引用次数：2 相关文章所有 4 个版本