Machine learning on small size samples: A synthetic knowledge synthesis

P Kokol, M Kokol, S Zagoranski - Science Progress, 2022 - journals.sagepub.com
Machine Learning is an increasingly important technology dealing with the growing
complexity of the digitalised world. Despite the fact, that we live in a 'Big data'world where …

Conventional and contemporary approaches used in text to speech synthesis: A review

N Kaur, P Singh - Artificial Intelligence Review, 2023 - Springer
Nowadays speech synthesis or text to speech (TTS), an ability of system to produce human
like natural sounding voice from the written text, is gaining popularity in the field of speech …

Emotional voice conversion: Theory, databases and ESD

K Zhou, B Sisman, R Liu, H Li - Speech Communication, 2022 - Elsevier
In this paper, we first provide a review of the state-of-the-art emotional voice conversion
research, and the existing emotional speech databases. We then motivate the development …

Instructtts: Modelling expressive tts in discrete latent space with natural language style prompt

D Yang, S Liu, R Huang, C Weng… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Expressive text-to-speech (TTS) aims to synthesize speech with varying speaking styles to
better reflect human speech patterns. In this study, we attempt to use natural language as a …

Promptstyle: Controllable style transfer for text-to-speech with natural language descriptions

G Liu, Y Zhang, Y Lei, Y Chen, R Wang, Z Li… - arXiv preprint arXiv …, 2023 - arxiv.org
Style transfer TTS has shown impressive performance in recent years. However, style
control is often restricted to systems built on expressive speech recordings with discrete style …

Speech synthesis with mixed emotions

K Zhou, B Sisman, R Rana… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Emotional speech synthesis aims to synthesize human voices with various emotional effects.
The current studies are mostly focused on imitating an averaged style belonging to a specific …

Controllable Data Generation by Deep Learning: A Review

S Wang, Y Du, X Guo, B Pan, Z Qin, L Zhao - ACM Computing Surveys, 2024 - dl.acm.org
Designing and generating new data under targeted properties has been attracting various
critical applications such as molecule design, image editing and speech synthesis …

Emotion controllable speech synthesis using emotion-unlabeled dataset with the assistance of cross-domain speech emotion recognition

X Cai, D Dai, Z Wu, X Li, J Li… - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
Neural text-to-speech (TTS) approaches generally require a huge number of high quality
speech data, which makes it difficult to obtain such a dataset with extra emotion labels. In …

Low-resource expressive text-to-speech using data augmentation

G Huybrechts, T Merritt, G Comini… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
While recent neural text-to-speech (TTS) systems perform remarkably well, they typically
require a substantial amount of recordings from the target speaker reading in the desired …

Text-to-speech for low-resource agglutinative language with morphology-aware language model pre-training

R Liu, Y Hu, H Zuo, Z Luo, L Wang… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Text-to-Speech (TTS) aims to convert the input text to a human-like voice. With the
development of deep learning, encoder-decoder based TTS models perform superior …