Foundation models for music: A survey
In recent years, foundation models (FMs) such as large language models (LLMs) and latent
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …
Attention-inspired artificial neural networks for speech processing: A systematic review
N Zacarias-Morales, P Pancardo… - Symmetry, 2021 - mdpi.com
Artificial Neural Networks (ANNs) were created inspired by the neural networks in the
human brain and have been widely applied in speech processing. The application areas of …
human brain and have been widely applied in speech processing. The application areas of …
Automatic lyrics transcription of polyphonic music with lyrics-chord multi-task learning
Lyrics are the words that make up a song, while chords are harmonic sets of multiple notes
in music. Lyrics and chords are generally essential information in music, ie unaccompanied …
in music. Lyrics and chords are generally essential information in music, ie unaccompanied …
Lyricwhiz: Robust multilingual zero-shot lyrics transcription by whispering to chatgpt
We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription
method achieving state-of-the-art performance on various lyrics transcription datasets, even …
method achieving state-of-the-art performance on various lyrics transcription datasets, even …
Transfer learning of wav2vec 2.0 for automatic lyric transcription
Automatic speech recognition (ASR) has progressed significantly in recent years due to the
emergence of large-scale datasets and the self-supervised learning (SSL) paradigm …
emergence of large-scale datasets and the self-supervised learning (SSL) paradigm …
Phoneme level lyrics alignment and text-informed singing voice separation
K Schulze-Forster, CSJ Doire… - … /ACM Transactions on …, 2021 - ieeexplore.ieee.org
The goal of singing voice separation is to recover the vocals signal from music mixtures.
State-of-the-art performance is achieved by deep neural networks trained in a supervised …
State-of-the-art performance is achieved by deep neural networks trained in a supervised …
Genre-conditioned acoustic models for automatic lyrics transcription of polyphonic music
Lyrics transcription of polyphonic music is challenging not only because the singing vocals
are corrupted by the background music, but also because the background music and the …
are corrupted by the background music, but also because the background music and the …
Mm-alt: A multimodal automatic lyric transcription system
Automatic lyric transcription (ALT) is a nascent field of study attracting increasing interest
from both the speech and music information retrieval communities, given its significant …
from both the speech and music information retrieval communities, given its significant …
Deep learning approaches in topics of singing information processing
Singing, the vocal productionof musical tones, is one of the most important elements of
music. Addressing the needs of real-world applications, the study of technologies related to …
music. Addressing the needs of real-world applications, the study of technologies related to …
MSTRE-Net: Multistreaming acoustic modeling for automatic lyrics transcription
This paper makes several contributions to automatic lyrics transcription (ALT) research. Our
main contribution is a novel variant of the Multistreaming Time-Delay Neural Network …
main contribution is a novel variant of the Multistreaming Time-Delay Neural Network …