A survey on neural speech synthesis
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …
speech given text, is a hot research topic in speech, language, and machine learning …
Aishell-3: A multi-speaker mandarin tts corpus and the baselines
In this paper, we present AISHELL-3, a large-scale and high-fidelity multi-speaker Mandarin
speech corpus which could be used to train multi-speaker Text-to-Speech (TTS) systems …
speech corpus which could be used to train multi-speaker Text-to-Speech (TTS) systems …
Opencpop: A high-quality open source chinese popular song corpus for singing voice synthesis
This paper introduces Opencpop, a publicly available high-quality Mandarin singing corpus
designed for singing voice synthesis (SVS). The corpus consists of 100 popular Mandarin …
designed for singing voice synthesis (SVS). The corpus consists of 100 popular Mandarin …
A review on speech separation in cocktail party environment: challenges and approaches
The Cocktail party problem, which is tracing and identifying a specific speaker's speech
while numerous speakers communicate concurrently is one of the crucial problems still to be …
while numerous speakers communicate concurrently is one of the crucial problems still to be …
A simultaneous denoising and dereverberation framework with target decoupling
Background noise and room reverberation are regarded as two major factors to degrade the
subjective speech quality. In this paper, we propose an integrated framework to address …
subjective speech quality. In this paper, we propose an integrated framework to address …
Mega-tts 2: Zero-shot text-to-speech with arbitrary length speech prompts
Zero-shot text-to-speech aims at synthesizing voices with unseen speech prompts. Previous
large-scale multispeaker TTS models have successfully achieved this goal with an enrolled …
large-scale multispeaker TTS models have successfully achieved this goal with an enrolled …
Multilingual speech recognition for Turkic languages
The primary aim of this study was to contribute to the development of multilingual automatic
speech recognition for lower-resourced Turkic languages. Ten languages—Azerbaijani …
speech recognition for lower-resourced Turkic languages. Ten languages—Azerbaijani …
Wav-BERT: Cooperative acoustic and linguistic representation learning for low-resource speech recognition
Unifying acoustic and linguistic representation learning has become increasingly crucial to
transfer the knowledge learned on the abundance of high-resource language data for low …
transfer the knowledge learned on the abundance of high-resource language data for low …
Emovie: A mandarin emotion speech dataset with a simple emotional text-to-speech model
Recently, there has been an increasing interest in neural speech synthesis. While the deep
neural network achieves the state-of-the-art result in text-to-speech (TTS) tasks, how to …
neural network achieves the state-of-the-art result in text-to-speech (TTS) tasks, how to …
Promptspeaker: Speaker generation based on text descriptions
Recently, text-guided content generation has received extensive attention. In this work, we
explore the possibility of text description-based speaker generation, ie, using text prompts to …
explore the possibility of text description-based speaker generation, ie, using text prompts to …