Deepfakes as a threat to a speaker and facial recognition: An overview of tools and attack vectors

A Firc, K Malinka, P Hanáček - Heliyon, 2023 - cell.com
Deepfakes present an emerging threat in cyberspace. Recent developments in machine
learning make deepfakes highly believable, and very difficult to differentiate between what is …

Cross-speaker emotion transfer for low-resource text-to-speech using non-parallel voice conversion with pitch-shift data augmentation

R Terashima, R Yamamoto, E Song… - arXiv preprint arXiv …, 2022 - arxiv.org
Data augmentation via voice conversion (VC) has been successfully applied to low-resource
expressive text-to-speech (TTS) when only neutral data for the target speaker are available …

Nonparallel emotional voice conversion for unseen speaker-emotion pairs using dual domain adversarial network & virtual domain pairing

N Shah, M Singh, N Takahashi… - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Primary goal of an emotional voice conversion (EVC) system is to convert the emotion of a
given speech signal from one style to another style without modifying the linguistic content of …

Promptvc: Flexible stylistic voice conversion in latent space driven by natural language prompts

J Yao, Y Yang, Y Lei, Z Ning, Y Hu… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Stylistic voice conversion aims to transform the style of source speech to a desired style
according to real-world application demands. However, the current style voice conversion …

TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis using ranking support vector machine with variational autoencoder

E Song, R Yamamoto, O Kwon, CH Song… - arXiv preprint arXiv …, 2022 - arxiv.org
Recent advances in synthetic speech quality have enabled us to train text-to-speech (TTS)
systems by using synthetic corpora. However, merely increasing the amount of synthetic …

Learning Emotional Representations from Imbalanced Speech Data for Speech Emotion Recognition and Emotional Text-to-Speech

S Wang, J Guðnason, D Borth - arXiv preprint arXiv:2306.05709, 2023 - arxiv.org
Effective speech emotional representations play a key role in Speech Emotion Recognition
(SER) and Emotional Text-To-Speech (TTS) tasks. However, emotional speech samples are …

Accented text-to-speech synthesis with a conditional variational autoencoder

J Melechovsky, A Mehrish, B Sisman… - arXiv preprint arXiv …, 2022 - arxiv.org
Accent plays a significant role in speech communication, influencing understanding
capabilities and also conveying a person's identity. This paper introduces a novel and …

A High-Quality Melody-Aware Peking Opera Synthesizer Using Data Augmentation

X Zhou, W Sun, X Shi - 2023 IEEE International Conference on …, 2023 - ieeexplore.ieee.org
The performing art of Peking Opera places great demands on the singing skills of singers,
including pronunciation, melody, role, personal style and emotional expression, which …

Nonparallel expressive tts for unseen target speaker using style-controlled adaptive layer and optimized pitch embedding

MS Al-Radhi, TG Csapó… - … Conference on Speech …, 2023 - ieeexplore.ieee.org
Recent advancements in text-to-speech (TTS) systems have focused on developing style-
controlled models that generate speech with desired characteristics such as accent, tone …

Robot reads ads: likability of calm and energetic audio advertising styles transferred to synthesized voices

H Pajupuu, J Pajupuu, R Altrov, I Kiissel - Frontiers in Communication, 2023 - frontiersin.org
The increasing prevalence of audio advertising has provided a challenge to find out more
about voices and performance styles used in advertisements. In this study, we were …