Controlling Emotion in Text-to-Speech with Natural Language Prompts

T Bott, F Lux, NT Vu - arXiv preprint arXiv:2406.06406, 2024 - arxiv.org
In recent years, prompting has quickly become one of the standard ways of steering the
outputs of generative machine learning models, due to its intuitive use of natural language …

EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech

X Qi, R Fu, Z Wen, J Tao, S Shi, Y Lu, Z Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
In the current era of Artificial Intelligence Generated Content (AIGC), a Low-Rank Adaptation
(LoRA) method has emerged. It uses a plugin-based approach to learn new knowledge with …

[引用][C] FastSpeechStyle: Fast, Emotion Controllable, and High-Quality Speech Synthesis

VT Nguyen, TN Do, HC Pham, TV Ho… - … Journal of Asian …, 2022 - World Scientific
Non-autoregressive text-to-speech models such as Fastspeech2 can fast synthesize high-
quality speech. This model also allows explicit control of the speech signal's pitch, energy …