Xiaoicesing 2: A high-fidelity singing voice synthesizer based on generative adversarial network

Y Ma, A Øland, A Ragni, BMS Del Sette, C Saitis… - arXiv preprint arXiv …, 2024 - arxiv.org

In recent years, foundation models (FMs) such as large language models (LLMs) and latent
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …

被引用次数：10 相关文章所有 4 个版本

[PDF] arxiv.org

Safeear: Content privacy-preserving audio deepfake detection

X Li, K Li, Y Zheng, C Yan, X Ji, W Xu - Proceedings of the 2024 on ACM …, 2024 - dl.acm.org

Text-to-Speech (TTS) and Voice Conversion (VC) models have exhibited remarkable
performance in generating realistic and natural audio. However, their dark side, audio …

被引用次数：4 相关文章所有 5 个版本

[PDF] arxiv.org

Singing voice data scaling-up: An introduction to ace-opencpop and kising-v2

J Shi, Y Lin, X Bai, K Zhang, Y Wu, Y Tang, Y Yu… - arXiv preprint arXiv …, 2024 - arxiv.org

In singing voice synthesis (SVS), generating singing voices from musical scores faces
challenges due to limited data availability, a constraint less common in text-to-speech (TTS) …

被引用次数：11 相关文章所有 2 个版本

[PDF] arxiv.org

HiFi-WaveGAN: Generative adversarial network with auxiliary spectrogram-phase loss for high-fidelity singing voice generation

C Wang, C Zeng, J Chen, O Xue - International Symposium on Neural …, 2024 - Springer

Entertainment-oriented singing voice synthesis (SVS) requires a vocoder to generate high-
fidelity (eg 48 kHz) audio. However, most text-to-speech (TTS) vocoders cannot reconstruct …

被引用次数：10 相关文章所有 5 个版本

[PDF] arxiv.org

Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm

Y Wu, J Shi, Y Yu, Y Tang, T Qian, Y Lin, J Han… - Proceedings of the …, 2024 - dl.acm.org

This research presents Muskits-ESPnet, a versatile toolkit that introduces new paradigms to
Singing Voice Synthesis (SVS) through the application of pretrained audio models in both …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

TokSing: Singing Voice Synthesis based on Discrete Tokens

Y Wu, J Shi, Y Tang, S Yang, Q Jin - arXiv preprint arXiv:2406.08416, 2024 - arxiv.org

Recent advancements in speech synthesis witness significant benefits by leveraging
discrete tokens extracted from self-supervised learning (SSL) models. Discrete tokens offer …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

Crosssinger: A Cross-Lingual Multi-Singer High-Fidelity Singing Voice Synthesizer Trained on Monolingual Singers

X Wang, C Zeng, J Chen… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

It is challenging to build a multi-singer high-fidelity singing voice synthesis system with cross-
lingual ability by only using monolingual singers in the training stage. In this paper, we …

被引用次数：5 相关文章所有 5 个版本

[PDF] aclanthology.org

Improving chinese pop song and hokkien gezi opera singing voice synthesis by enhancing local modeling

P Bai, Y Zhou, M Zheng, W Sun… - Proceedings of the 2023 …, 2023 - aclanthology.org

Abstract Singing Voice Synthesis (SVS) strives to synthesize pleasing vocals based on
music scores and lyrics. The current acoustic models based on Transformer usually process …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Low-Resource Cross-Domain Singing Voice Synthesis via Reduced Self-Supervised Speech Representations

P Kakoulidis, N Ellinas, G Vamvoukakis… - arXiv preprint arXiv …, 2024 - arxiv.org

In this paper, we propose a singing voice synthesis model, Karaoker-SSL, that is trained
only on text and speech data as a typical multi-speaker acoustic model. It is a low-resource …

被引用次数：1 相关文章所有 4 个版本

A High-Quality Melody-Aware Peking Opera Synthesizer Using Data Augmentation

X Zhou, W Sun, X Shi - 2023 IEEE International Conference on …, 2023 - ieeexplore.ieee.org

The performing art of Peking Opera places great demands on the singing skills of singers,
including pronunciation, melody, role, personal style and emotional expression, which …

被引用次数：1 相关文章所有 2 个版本