Foundation models for music: A survey

Y Ma, A Øland, A Ragni, BMS Del Sette, C Saitis… - arXiv preprint arXiv …, 2024 - arxiv.org
In recent years, foundation models (FMs) such as large language models (LLMs) and latent
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …

Attention-inspired artificial neural networks for speech processing: A systematic review

N Zacarias-Morales, P Pancardo… - Symmetry, 2021 - mdpi.com
Artificial Neural Networks (ANNs) were created inspired by the neural networks in the
human brain and have been widely applied in speech processing. The application areas of …

Automatic lyrics transcription of polyphonic music with lyrics-chord multi-task learning

X Gao, C Gupta, H Li - IEEE/ACM Transactions on Audio …, 2022 - ieeexplore.ieee.org
Lyrics are the words that make up a song, while chords are harmonic sets of multiple notes
in music. Lyrics and chords are generally essential information in music, ie unaccompanied …

Lyricwhiz: Robust multilingual zero-shot lyrics transcription by whispering to chatgpt

L Zhuo, R Yuan, J Pan, Y Ma, Y Li, G Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription
method achieving state-of-the-art performance on various lyrics transcription datasets, even …

Transfer learning of wav2vec 2.0 for automatic lyric transcription

L Ou, X Gu, Y Wang - arXiv preprint arXiv:2207.09747, 2022 - arxiv.org
Automatic speech recognition (ASR) has progressed significantly in recent years due to the
emergence of large-scale datasets and the self-supervised learning (SSL) paradigm …

Phoneme level lyrics alignment and text-informed singing voice separation

K Schulze-Forster, CSJ Doire… - … /ACM Transactions on …, 2021 - ieeexplore.ieee.org
The goal of singing voice separation is to recover the vocals signal from music mixtures.
State-of-the-art performance is achieved by deep neural networks trained in a supervised …

Genre-conditioned acoustic models for automatic lyrics transcription of polyphonic music

X Gao, C Gupta, H Li - ICASSP 2022-2022 IEEE International …, 2022 - ieeexplore.ieee.org
Lyrics transcription of polyphonic music is challenging not only because the singing vocals
are corrupted by the background music, but also because the background music and the …

Mm-alt: A multimodal automatic lyric transcription system

X Gu, L Ou, D Ong, Y Wang - Proceedings of the 30th ACM International …, 2022 - dl.acm.org
Automatic lyric transcription (ALT) is a nascent field of study attracting increasing interest
from both the speech and music information retrieval communities, given its significant …

Deep learning approaches in topics of singing information processing

C Gupta, H Li, M Goto - IEEE/ACM Transactions on Audio …, 2022 - ieeexplore.ieee.org
Singing, the vocal productionof musical tones, is one of the most important elements of
music. Addressing the needs of real-world applications, the study of technologies related to …

MSTRE-Net: Multistreaming acoustic modeling for automatic lyrics transcription

E Demirel, S Ahlbäck, S Dixon - arXiv preprint arXiv:2108.02625, 2021 - arxiv.org
This paper makes several contributions to automatic lyrics transcription (ALT) research. Our
main contribution is a novel variant of the Multistreaming Time-Delay Neural Network …