A survey on neural speech synthesis

X Tan, T Qin, F Soong, TY Liu - arXiv preprint arXiv:2106.15561, 2021 - arxiv.org
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

Aishell-3: A multi-speaker mandarin tts corpus and the baselines

Y Shi, H Bu, X Xu, S Zhang, M Li - arXiv preprint arXiv:2010.11567, 2020 - arxiv.org
In this paper, we present AISHELL-3, a large-scale and high-fidelity multi-speaker Mandarin
speech corpus which could be used to train multi-speaker Text-to-Speech (TTS) systems …

Opencpop: A high-quality open source chinese popular song corpus for singing voice synthesis

Y Wang, X Wang, P Zhu, J Wu, H Li, H Xue… - arXiv preprint arXiv …, 2022 - arxiv.org
This paper introduces Opencpop, a publicly available high-quality Mandarin singing corpus
designed for singing voice synthesis (SVS). The corpus consists of 100 popular Mandarin …

A review on speech separation in cocktail party environment: challenges and approaches

J Agrawal, M Gupta, H Garg - Multimedia Tools and Applications, 2023 - Springer
The Cocktail party problem, which is tracing and identifying a specific speaker's speech
while numerous speakers communicate concurrently is one of the crucial problems still to be …

A simultaneous denoising and dereverberation framework with target decoupling

A Li, W Liu, X Luo, G Yu, C Zheng, X Li - arXiv preprint arXiv:2106.12743, 2021 - arxiv.org
Background noise and room reverberation are regarded as two major factors to degrade the
subjective speech quality. In this paper, we propose an integrated framework to address …

Mega-tts 2: Zero-shot text-to-speech with arbitrary length speech prompts

Z Jiang, J Liu, Y Ren, J He, C Zhang, Z Ye… - arXiv preprint arXiv …, 2023 - arxiv.org
Zero-shot text-to-speech aims at synthesizing voices with unseen speech prompts. Previous
large-scale multispeaker TTS models have successfully achieved this goal with an enrolled …

Multilingual speech recognition for Turkic languages

S Mussakhojayeva, K Dauletbek, R Yeshpanov… - Information, 2023 - mdpi.com
The primary aim of this study was to contribute to the development of multilingual automatic
speech recognition for lower-resourced Turkic languages. Ten languages—Azerbaijani …

Wav-BERT: Cooperative acoustic and linguistic representation learning for low-resource speech recognition

G Zheng, Y Xiao, K Gong, P Zhou, X Liang… - arXiv preprint arXiv …, 2021 - arxiv.org
Unifying acoustic and linguistic representation learning has become increasingly crucial to
transfer the knowledge learned on the abundance of high-resource language data for low …

Emovie: A mandarin emotion speech dataset with a simple emotional text-to-speech model

C Cui, Y Ren, J Liu, F Chen, R Huang, M Lei… - arXiv preprint arXiv …, 2021 - arxiv.org
Recently, there has been an increasing interest in neural speech synthesis. While the deep
neural network achieves the state-of-the-art result in text-to-speech (TTS) tasks, how to …

Promptspeaker: Speaker generation based on text descriptions

Y Zhang, G Liu, Y Lei, Y Chen, H Yin… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Recently, text-guided content generation has received extensive attention. In this work, we
explore the possibility of text description-based speaker generation, ie, using text prompts to …