Didispeech: A large scale mandarin speech corpus

X Tan, T Qin, F Soong, TY Liu - arXiv preprint arXiv:2106.15561, 2021 - arxiv.org

Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

被引用次数：418 相关文章所有 2 个版本

[PDF] arxiv.org

Aishell-3: A multi-speaker mandarin tts corpus and the baselines

Y Shi, H Bu, X Xu, S Zhang, M Li - arXiv preprint arXiv:2010.11567, 2020 - arxiv.org

In this paper, we present AISHELL-3, a large-scale and high-fidelity multi-speaker Mandarin
speech corpus which could be used to train multi-speaker Text-to-Speech (TTS) systems …

被引用次数：241 相关文章所有 8 个版本

[PDF] arxiv.org

Opencpop: A high-quality open source chinese popular song corpus for singing voice synthesis

Y Wang, X Wang, P Zhu, J Wu, H Li, H Xue… - arXiv preprint arXiv …, 2022 - arxiv.org

This paper introduces Opencpop, a publicly available high-quality Mandarin singing corpus
designed for singing voice synthesis (SVS). The corpus consists of 100 popular Mandarin …

被引用次数：89 相关文章所有 5 个版本

A review on speech separation in cocktail party environment: challenges and approaches

J Agrawal, M Gupta, H Garg - Multimedia Tools and Applications, 2023 - Springer

The Cocktail party problem, which is tracing and identifying a specific speaker's speech
while numerous speakers communicate concurrently is one of the crucial problems still to be …

被引用次数：14 相关文章所有 3 个版本

[PDF] arxiv.org

A simultaneous denoising and dereverberation framework with target decoupling

A Li, W Liu, X Luo, G Yu, C Zheng, X Li - arXiv preprint arXiv:2106.12743, 2021 - arxiv.org

Background noise and room reverberation are regarded as two major factors to degrade the
subjective speech quality. In this paper, we propose an integrated framework to address …

被引用次数：73 相关文章所有 8 个版本

[PDF] arxiv.org

Mega-tts 2: Zero-shot text-to-speech with arbitrary length speech prompts

Z Jiang, J Liu, Y Ren, J He, C Zhang, Z Ye… - arXiv preprint arXiv …, 2023 - arxiv.org

Zero-shot text-to-speech aims at synthesizing voices with unseen speech prompts. Previous
large-scale multispeaker TTS models have successfully achieved this goal with an enrolled …

被引用次数：23 相关文章所有 2 个版本

[PDF] mdpi.com

Multilingual speech recognition for Turkic languages

S Mussakhojayeva, K Dauletbek, R Yeshpanov… - Information, 2023 - mdpi.com

The primary aim of this study was to contribute to the development of multilingual automatic
speech recognition for lower-resourced Turkic languages. Ten languages—Azerbaijani …

被引用次数：21 相关文章所有 3 个版本

[PDF] arxiv.org

Wav-BERT: Cooperative acoustic and linguistic representation learning for low-resource speech recognition

G Zheng, Y Xiao, K Gong, P Zhou, X Liang… - arXiv preprint arXiv …, 2021 - arxiv.org

Unifying acoustic and linguistic representation learning has become increasingly crucial to
transfer the knowledge learned on the abundance of high-resource language data for low …

被引用次数：32 相关文章所有 4 个版本

[PDF] arxiv.org

Emovie: A mandarin emotion speech dataset with a simple emotional text-to-speech model

C Cui, Y Ren, J Liu, F Chen, R Huang, M Lei… - arXiv preprint arXiv …, 2021 - arxiv.org

Recently, there has been an increasing interest in neural speech synthesis. While the deep
neural network achieves the state-of-the-art result in text-to-speech (TTS) tasks, how to …

被引用次数：33 相关文章所有 7 个版本

[PDF] arxiv.org

Promptspeaker: Speaker generation based on text descriptions

Y Zhang, G Liu, Y Lei, Y Chen, H Yin… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

Recently, text-guided content generation has received extensive attention. In this work, we
explore the possibility of text description-based speaker generation, ie, using text prompts to …

被引用次数：12 相关文章所有 3 个版本