Exploring transfer learning for low resource emotional tts

P Kokol, M Kokol, S Zagoranski - Science Progress, 2022 - journals.sagepub.com

Machine Learning is an increasingly important technology dealing with the growing
complexity of the digitalised world. Despite the fact, that we live in a 'Big data'world where …

被引用次数：145 相关文章所有 9 个版本

Conventional and contemporary approaches used in text to speech synthesis: A review

N Kaur, P Singh - Artificial Intelligence Review, 2023 - Springer

Nowadays speech synthesis or text to speech (TTS), an ability of system to produce human
like natural sounding voice from the written text, is gaining popularity in the field of speech …

被引用次数：41 相关文章所有 3 个版本

[PDF] sciencedirect.com

Emotional voice conversion: Theory, databases and ESD

K Zhou, B Sisman, R Liu, H Li - Speech Communication, 2022 - Elsevier

In this paper, we first provide a review of the state-of-the-art emotional voice conversion
research, and the existing emotional speech databases. We then motivate the development …

被引用次数：153 相关文章所有 7 个版本

[PDF] arxiv.org

Instructtts: Modelling expressive tts in discrete latent space with natural language style prompt

D Yang, S Liu, R Huang, C Weng… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org

Expressive text-to-speech (TTS) aims to synthesize speech with varying speaking styles to
better reflect human speech patterns. In this study, we attempt to use natural language as a …

被引用次数：66 相关文章所有 3 个版本

[PDF] arxiv.org

Promptstyle: Controllable style transfer for text-to-speech with natural language descriptions

G Liu, Y Zhang, Y Lei, Y Chen, R Wang, Z Li… - arXiv preprint arXiv …, 2023 - arxiv.org

Style transfer TTS has shown impressive performance in recent years. However, style
control is often restricted to systems built on expressive speech recordings with discrete style …

被引用次数：33 相关文章所有 4 个版本

[PDF] ieee.org

Speech synthesis with mixed emotions

K Zhou, B Sisman, R Rana… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Emotional speech synthesis aims to synthesize human voices with various emotional effects.
The current studies are mostly focused on imitating an averaged style belonging to a specific …

被引用次数：45 相关文章所有 7 个版本

Controllable Data Generation by Deep Learning: A Review

S Wang, Y Du, X Guo, B Pan, Z Qin, L Zhao - ACM Computing Surveys, 2024 - dl.acm.org

Designing and generating new data under targeted properties has been attracting various
critical applications such as molecule design, image editing and speech synthesis …

被引用次数：7 相关文章

[PDF] arxiv.org

Emotion controllable speech synthesis using emotion-unlabeled dataset with the assistance of cross-domain speech emotion recognition

X Cai, D Dai, Z Wu, X Li, J Li… - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org

Neural text-to-speech (TTS) approaches generally require a huge number of high quality
speech data, which makes it difficult to obtain such a dataset with extra emotion labels. In …

被引用次数：76 相关文章所有 5 个版本

[PDF] arxiv.org

Low-resource expressive text-to-speech using data augmentation

G Huybrechts, T Merritt, G Comini… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

While recent neural text-to-speech (TTS) systems perform remarkably well, they typically
require a substantial amount of recordings from the target speaker reading in the desired …

被引用次数：68 相关文章所有 5 个版本

Text-to-speech for low-resource agglutinative language with morphology-aware language model pre-training

R Liu, Y Hu, H Zuo, Z Luo, L Wang… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org

Text-to-Speech (TTS) aims to convert the input text to a human-like voice. With the
development of deep learning, encoder-decoder based TTS models perform superior …

被引用次数：12 相关文章所有 2 个版本