Cross-speaker style transfer for text-to-speech using data augmentation

A Firc, K Malinka, P Hanáček - Heliyon, 2023 - cell.com

Deepfakes present an emerging threat in cyberspace. Recent developments in machine
learning make deepfakes highly believable, and very difficult to differentiate between what is …

被引用次数：17 相关文章所有 7 个版本

[PDF] arxiv.org

Cross-speaker emotion transfer for low-resource text-to-speech using non-parallel voice conversion with pitch-shift data augmentation

R Terashima, R Yamamoto, E Song… - arXiv preprint arXiv …, 2022 - arxiv.org

Data augmentation via voice conversion (VC) has been successfully applied to low-resource
expressive text-to-speech (TTS) when only neutral data for the target speaker are available …

被引用次数：17 相关文章所有 6 个版本

[PDF] arxiv.org

Nonparallel emotional voice conversion for unseen speaker-emotion pairs using dual domain adversarial network & virtual domain pairing

N Shah, M Singh, N Takahashi… - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org

Primary goal of an emotional voice conversion (EVC) system is to convert the emotion of a
given speech signal from one style to another style without modifying the linguistic content of …

被引用次数：9 相关文章所有 8 个版本

[PDF] arxiv.org

Promptvc: Flexible stylistic voice conversion in latent space driven by natural language prompts

J Yao, Y Yang, Y Lei, Z Ning, Y Hu… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

Stylistic voice conversion aims to transform the style of source speech to a desired style
according to real-world application demands. However, the current style voice conversion …

被引用次数：8 相关文章所有 3 个版本

[PDF] arxiv.org

TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis using ranking support vector machine with variational autoencoder

E Song, R Yamamoto, O Kwon, CH Song… - arXiv preprint arXiv …, 2022 - arxiv.org

Recent advances in synthetic speech quality have enabled us to train text-to-speech (TTS)
systems by using synthetic corpora. However, merely increasing the amount of synthetic …

被引用次数：6 相关文章所有 6 个版本

[PDF] arxiv.org

Learning Emotional Representations from Imbalanced Speech Data for Speech Emotion Recognition and Emotional Text-to-Speech

S Wang, J Guðnason, D Borth - arXiv preprint arXiv:2306.05709, 2023 - arxiv.org

Effective speech emotional representations play a key role in Speech Emotion Recognition
(SER) and Emotional Text-To-Speech (TTS) tasks. However, emotional speech samples are …

被引用次数：2 相关文章所有 6 个版本

[PDF] arxiv.org

Accented text-to-speech synthesis with a conditional variational autoencoder

J Melechovsky, A Mehrish, B Sisman… - arXiv preprint arXiv …, 2022 - arxiv.org

Accent plays a significant role in speech communication, influencing understanding
capabilities and also conveying a person's identity. This paper introduces a novel and …

被引用次数：3 相关文章所有 5 个版本

A High-Quality Melody-Aware Peking Opera Synthesizer Using Data Augmentation

X Zhou, W Sun, X Shi - 2023 IEEE International Conference on …, 2023 - ieeexplore.ieee.org

The performing art of Peking Opera places great demands on the singing skills of singers,
including pronunciation, melody, role, personal style and emotional expression, which …

被引用次数：1 相关文章所有 2 个版本

[PDF] github.io

Nonparallel expressive tts for unseen target speaker using style-controlled adaptive layer and optimized pitch embedding

MS Al-Radhi, TG Csapó… - … Conference on Speech …, 2023 - ieeexplore.ieee.org

Recent advancements in text-to-speech (TTS) systems have focused on developing style-
controlled models that generate speech with desired characteristics such as accent, tone …

被引用次数：1 相关文章所有 2 个版本

[PDF] frontiersin.org

Robot reads ads: likability of calm and energetic audio advertising styles transferred to synthesized voices

H Pajupuu, J Pajupuu, R Altrov, I Kiissel - Frontiers in Communication, 2023 - frontiersin.org

The increasing prevalence of audio advertising has provided a challenge to find out more
about voices and performance styles used in advertisements. In this study, we were …

被引用次数：2 相关文章所有 3 个版本