Unsupervised style and content separation by minimizing mutual information for speech synthesis

Vqmivc: Vector quantization and mutual information-based unsupervised speech representation disentanglement for one-shot voice conversion

D Wang, L Deng, YT Yeung, X Chen, X Liu… - arXiv preprint arXiv …, 2021 - arxiv.org

One-shot voice conversion (VC), which performs conversion across arbitrary speakers with
only a single target-speaker utterance for reference, can be effectively achieved by speech …

被引用次数：166 相关文章所有 8 个版本

[PDF] arxiv.org

Transformers in speech processing: A survey

S Latif, A Zaidi, H Cuayahuitl, F Shamshad… - arXiv preprint arXiv …, 2023 - arxiv.org

The remarkable success of transformers in the field of natural language processing has
sparked the interest of the speech-processing community, leading to an exploration of their …

被引用次数：65 相关文章所有 4 个版本

[PDF] arxiv.org

Msemotts: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis

Y Lei, S Yang, X Wang, L Xie - IEEE/ACM Transactions on …, 2022 - ieeexplore.ieee.org

Expressive synthetic speech is essential for many human-computer interaction and audio
broadcast scenarios, and thus synthesizing expressive speech has attracted much attention …

被引用次数：79 相关文章所有 4 个版本

[PDF] arxiv.org

Privacy-preserving voice analysis via disentangled representations

R Aloufi, H Haddadi, D Boyle - Proceedings of the 2020 ACM SIGSAC …, 2020 - dl.acm.org

Voice User Interfaces (VUIs) are increasingly popular and built into smartphones, home
assistants, and Internet of Things (IoT) devices. Despite offering an always-on convenient …

被引用次数：73 相关文章所有 4 个版本

[PDF] arxiv.org

Synt++: Utilizing imperfect synthetic data to improve speech recognition

TY Hu, M Armandpour, A Shrivastava… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

With recent advances in speech synthesis, synthetic data is becoming a viable alternative to
real data for training speech recognition models. However, machine learning with synthetic …

被引用次数：49 相关文章所有 3 个版本

[PDF] arxiv.org

Fine-grained style control in transformer-based text-to-speech synthesis

LW Chen, A Rudnicky - ICASSP 2022-2022 IEEE International …, 2022 - ieeexplore.ieee.org

In this paper, we present a novel architecture to realize fine-grained style control on the
transformer-based text-to-speech synthesis (TransformerTTS). Specifically, we model the …

被引用次数：37 相关文章所有 4 个版本

[PDF] mlr.press

Style equalization: Unsupervised learning of controllable generative sequence models

JHR Chang, A Shrivastava, H Koppula… - International …, 2022 - proceedings.mlr.press

Controllable generative sequence models with the capability to extract and replicate the
style of specific examples enable many applications, including narrating audiobooks in …

被引用次数：23 相关文章所有 4 个版本

[PDF] arxiv.org

Self-supervised context-aware style representation for expressive speech synthesis

Y Wu, X Wang, S Zhang, L He, R Song… - arXiv preprint arXiv …, 2022 - arxiv.org

Expressive speech synthesis, like audiobook synthesis, is still challenging for style
representation learning and prediction. Deriving from reference audio or predicting style …

被引用次数：19 相关文章所有 6 个版本

[PDF] arxiv.org

Fine-grained style modeling, transfer and prediction in text-to-speech synthesis via phone-level content-style disentanglement

D Tan, T Lee - arXiv preprint arXiv:2011.03943, 2020 - arxiv.org

This paper presents a novel design of neural network system for fine-grained style modeling,
transfer and prediction in expressive text-to-speech (TTS) synthesis. Fine-grained modeling …

被引用次数：22 相关文章所有 6 个版本

[PDF] whiterose.ac.uk

Speaker-independent emotional voice conversion via disentangled representations

X Chen, X Xu, J Chen, Z Zhang… - IEEE Transactions …, 2022 - ieeexplore.ieee.org

Emotional Voice Conversion (EVC) technology aims to transfer emotional state in speech
while keeping the linguistic information and speaker identity unchanged. Prior studies on …

被引用次数：9 相关文章所有 8 个版本