Emotion intensity and its control for emotional voice conversion

K Zhou, B Sisman, R Rana… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while
preserving the linguistic content and speaker identity. In EVC, emotions are usually treated …

Speech synthesis with mixed emotions

K Zhou, B Sisman, R Rana… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Emotional speech synthesis aims to synthesize human voices with various emotional effects.
The current studies are mostly focused on imitating an averaged style belonging to a specific …

Styletts-vc: One-shot voice conversion by knowledge transfer from style-based tts models

YA Li, C Han, N Mesgarani - 2022 IEEE Spoken Language …, 2023 - ieeexplore.ieee.org
One-shot voice conversion (VC) aims to convert speech from any source speaker to an
arbitrary target speaker with only a few seconds of reference speech from the target speaker …

Visinger 2: High-fidelity end-to-end singing voice synthesis enhanced by digital signal processing synthesizer

Y Zhang, H Xue, H Li, L Xie, T Guo, R Zhang… - arXiv preprint arXiv …, 2022 - arxiv.org
End-to-end singing voice synthesis (SVS) model VISinger can achieve better performance
than the typical two-stage model with fewer parameters. However, VISinger has several …

Converting foreign accent speech without a reference

G Zhao, S Ding… - IEEE/ACM Transactions on …, 2021 - ieeexplore.ieee.org
Foreign accent conversion (FAC) is the problem of generating a synthetic voice that has the
voice identity of a second-language (L2) learner and the pronunciation patterns of a native …

A comparative study of voice conversion models with large-scale speech and singing data: The T13 systems for the singing voice conversion challenge 2023

R Yamamoto, R Yoneyama, LP Violeta… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
This paper presents our systems (denoted as T13) for the singing voice conversion
challenge (SVCC) 2023. For both in-domain and cross-domain English singing voice …

Acoustic tracking of pitch, modal, and subharmonic vibrations of vocal folds in Parkinson's disease and parkinsonism

J Hlavnička, R Čmejla, J Klempíř, E Růžička… - IEEE Access, 2019 - ieeexplore.ieee.org
The prominent and early presence of dysphonia is considered a valuable marker for
differentiation of idiopathic Parkinson's disease and parkinsonian syndromes. Objective …

A fast high-fidelity source-filter vocoder with lightweight neural modules

R Yang, Y Peng, X Hu - IEEE/ACM Transactions on Audio …, 2023 - ieeexplore.ieee.org
The quality of raw audio waveform generated by a vocoder could affect various audio
generative tasks. In recent years, the dominance of source-filter vocoders was greatly …

Validation of freely-available pitch detection algorithms across various noise levels in assessing speech captured by smartphone in Parkinson's disease

V Illner, P Sovka, J Rusz - Biomedical Signal Processing and Control, 2020 - Elsevier
Measuring the fundamental frequency of the vocal folds F 0 is recognized as an important
parameter in the assessment of speech impairments in Parkinsons disease (PD). Although a …

Traditional machine learning for pitch detection

T Drugman, G Huybrechts, V Klimkov… - IEEE Signal …, 2018 - ieeexplore.ieee.org
Pitch detection is a fundamental problem in speech processing as F0 is used in a large
number of applications. Recent papers have proposed deep learning for robust pitch …