Foundation models for music: A survey
In recent years, foundation models (FMs) such as large language models (LLMs) and latent
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …
Safeear: Content privacy-preserving audio deepfake detection
Text-to-Speech (TTS) and Voice Conversion (VC) models have exhibited remarkable
performance in generating realistic and natural audio. However, their dark side, audio …
performance in generating realistic and natural audio. However, their dark side, audio …
Singing voice data scaling-up: An introduction to ace-opencpop and kising-v2
In singing voice synthesis (SVS), generating singing voices from musical scores faces
challenges due to limited data availability, a constraint less common in text-to-speech (TTS) …
challenges due to limited data availability, a constraint less common in text-to-speech (TTS) …
HiFi-WaveGAN: Generative adversarial network with auxiliary spectrogram-phase loss for high-fidelity singing voice generation
Entertainment-oriented singing voice synthesis (SVS) requires a vocoder to generate high-
fidelity (eg 48 kHz) audio. However, most text-to-speech (TTS) vocoders cannot reconstruct …
fidelity (eg 48 kHz) audio. However, most text-to-speech (TTS) vocoders cannot reconstruct …
Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm
This research presents Muskits-ESPnet, a versatile toolkit that introduces new paradigms to
Singing Voice Synthesis (SVS) through the application of pretrained audio models in both …
Singing Voice Synthesis (SVS) through the application of pretrained audio models in both …
TokSing: Singing Voice Synthesis based on Discrete Tokens
Recent advancements in speech synthesis witness significant benefits by leveraging
discrete tokens extracted from self-supervised learning (SSL) models. Discrete tokens offer …
discrete tokens extracted from self-supervised learning (SSL) models. Discrete tokens offer …
Crosssinger: A Cross-Lingual Multi-Singer High-Fidelity Singing Voice Synthesizer Trained on Monolingual Singers
It is challenging to build a multi-singer high-fidelity singing voice synthesis system with cross-
lingual ability by only using monolingual singers in the training stage. In this paper, we …
lingual ability by only using monolingual singers in the training stage. In this paper, we …
Improving chinese pop song and hokkien gezi opera singing voice synthesis by enhancing local modeling
P Bai, Y Zhou, M Zheng, W Sun… - Proceedings of the 2023 …, 2023 - aclanthology.org
Abstract Singing Voice Synthesis (SVS) strives to synthesize pleasing vocals based on
music scores and lyrics. The current acoustic models based on Transformer usually process …
music scores and lyrics. The current acoustic models based on Transformer usually process …
Low-Resource Cross-Domain Singing Voice Synthesis via Reduced Self-Supervised Speech Representations
In this paper, we propose a singing voice synthesis model, Karaoker-SSL, that is trained
only on text and speech data as a typical multi-speaker acoustic model. It is a low-resource …
only on text and speech data as a typical multi-speaker acoustic model. It is a low-resource …
A High-Quality Melody-Aware Peking Opera Synthesizer Using Data Augmentation
X Zhou, W Sun, X Shi - 2023 IEEE International Conference on …, 2023 - ieeexplore.ieee.org
The performing art of Peking Opera places great demands on the singing skills of singers,
including pronunciation, melody, role, personal style and emotional expression, which …
including pronunciation, melody, role, personal style and emotional expression, which …