Emotion intensity and its control for emotional voice conversion
Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while
preserving the linguistic content and speaker identity. In EVC, emotions are usually treated …
preserving the linguistic content and speaker identity. In EVC, emotions are usually treated …
Speech synthesis with mixed emotions
Emotional speech synthesis aims to synthesize human voices with various emotional effects.
The current studies are mostly focused on imitating an averaged style belonging to a specific …
The current studies are mostly focused on imitating an averaged style belonging to a specific …
Styletts-vc: One-shot voice conversion by knowledge transfer from style-based tts models
One-shot voice conversion (VC) aims to convert speech from any source speaker to an
arbitrary target speaker with only a few seconds of reference speech from the target speaker …
arbitrary target speaker with only a few seconds of reference speech from the target speaker …
Visinger 2: High-fidelity end-to-end singing voice synthesis enhanced by digital signal processing synthesizer
End-to-end singing voice synthesis (SVS) model VISinger can achieve better performance
than the typical two-stage model with fewer parameters. However, VISinger has several …
than the typical two-stage model with fewer parameters. However, VISinger has several …
Converting foreign accent speech without a reference
Foreign accent conversion (FAC) is the problem of generating a synthetic voice that has the
voice identity of a second-language (L2) learner and the pronunciation patterns of a native …
voice identity of a second-language (L2) learner and the pronunciation patterns of a native …
A comparative study of voice conversion models with large-scale speech and singing data: The T13 systems for the singing voice conversion challenge 2023
This paper presents our systems (denoted as T13) for the singing voice conversion
challenge (SVCC) 2023. For both in-domain and cross-domain English singing voice …
challenge (SVCC) 2023. For both in-domain and cross-domain English singing voice …
Acoustic tracking of pitch, modal, and subharmonic vibrations of vocal folds in Parkinson's disease and parkinsonism
J Hlavnička, R Čmejla, J Klempíř, E Růžička… - IEEE Access, 2019 - ieeexplore.ieee.org
The prominent and early presence of dysphonia is considered a valuable marker for
differentiation of idiopathic Parkinson's disease and parkinsonian syndromes. Objective …
differentiation of idiopathic Parkinson's disease and parkinsonian syndromes. Objective …
A fast high-fidelity source-filter vocoder with lightweight neural modules
R Yang, Y Peng, X Hu - IEEE/ACM Transactions on Audio …, 2023 - ieeexplore.ieee.org
The quality of raw audio waveform generated by a vocoder could affect various audio
generative tasks. In recent years, the dominance of source-filter vocoders was greatly …
generative tasks. In recent years, the dominance of source-filter vocoders was greatly …
Validation of freely-available pitch detection algorithms across various noise levels in assessing speech captured by smartphone in Parkinson's disease
Measuring the fundamental frequency of the vocal folds F 0 is recognized as an important
parameter in the assessment of speech impairments in Parkinsons disease (PD). Although a …
parameter in the assessment of speech impairments in Parkinsons disease (PD). Although a …
Traditional machine learning for pitch detection
Pitch detection is a fundamental problem in speech processing as F0 is used in a large
number of applications. Recent papers have proposed deep learning for robust pitch …
number of applications. Recent papers have proposed deep learning for robust pitch …