On the use of u-net for dominant melody estimation in polyphonic music

nnaudio: An on-the-fly gpu audio to spectrogram conversion toolbox using 1d convolutional neural networks

KW Cheuk, H Anderson, K Agres, D Herremans - IEEE Access, 2020 - ieeexplore.ieee.org

In this paper, we present nnAudio, a new neural network-based audio processing framework
with graphics processing unit (GPU) support that leverages 1D convolutional neural …

被引用次数：106 相关文章所有 6 个版本

[PDF] hal.science

Fully-convolutional network for pitch estimation of speech signals

L Ardaillon, A Roebel - Insterspeech 2019, 2019 - hal.science

The estimation of fundamental frequency (F0) from audio is a necessary step in many
speech processing tasks such as speech synthesis, that require to accurately analyze big …

被引用次数：39 相关文章所有 7 个版本

[PDF] arxiv.org

Cover detection using dominant melody embeddings

G Doras, G Peeters - arXiv preprint arXiv:1907.01824, 2019 - arxiv.org

Automatic cover detection--the task of finding in an audio database all the covers of one or
several query tracks--has long been seen as a challenging theoretical problem in the MIR …

被引用次数：38 相关文章所有 8 个版本

Comparing deep models and evaluation strategies for multi-pitch estimation in music recordings

C Weiß, G Peeters - IEEE/ACM Transactions on Audio, Speech …, 2022 - ieeexplore.ieee.org

Extracting pitch information from music recordings is a challenging but important problem in
music signal processing. Frame-wise transcription or multi-pitch estimation aims for …

被引用次数：9 相关文章所有 3 个版本

[PDF] mdpi.com

Jazz bass transcription using a U-net architecture

J Abeßer, M Müller - Electronics, 2021 - mdpi.com

In this paper, we adapt a recently proposed U-net deep neural network architecture from
melody to bass transcription. We investigate pitch shifting and random equalization as data …

被引用次数：18 相关文章所有 3 个版本

[PDF] semanticscholar.org

[PDF][PDF] Creating DALI, a Large Dataset of Synchronized Audio, Lyrics, and Notes.

G Meseguer-Brocal… - Trans. Int. Soc. Music …, 2020 - pdfs.semanticscholar.org

The DALI dataset is a large dataset of time-aligned symbolic vocal melody notations (notes)
and lyrics at four levels of granularity. DALI contains 5358 songs in its first version and 7756 …

被引用次数：26 相关文章所有 6 个版本

[PDF] arxiv.org

A prototypical triplet loss for cover detection

G Doras, G Peeters - ICASSP 2020-2020 IEEE International …, 2020 - ieeexplore.ieee.org

Automatic cover detection-the task of finding in an audio dataset all covers of a query track-
has long been a challenging theoretical problem in MIR community. It also became a …

被引用次数：25 相关文章所有 4 个版本

[PDF] upf.edu

Combining musical features for cover detection

G Doras, F Yesiler, J Serrà Julià… - … J, Ha Lee J, McFee B …, 2020 - repositori.upf.edu

Recent works have addressed the automatic cover detection problem from a metric learning
perspective. They employ different input representations, aiming to exploit melodic or …

被引用次数：16 相关文章所有 4 个版本

[PDF] ismir.net

[PDF][PDF] Multi-pitch Estimation meets Microphone Mismatch: Applicability of Domain Adaptation.

F Bittner, M Gonzalez, ML Richter, HM Lukashevich… - ISMIR, 2022 - archives.ismir.net

The performance of machine learning (ML) models is known to be affected by discrepancies
between training (source) and real-world (target) data distributions. This problem is referred …

被引用次数：5 相关文章所有 3 个版本

[PDF] interspeech2020.org

[PDF][PDF] Robust Pitch Regression with Voiced/Unvoiced Classification in Nonstationary Noise Environments.

DN Tran, U Batricevic, K Koishida - INTERSPEECH, 2020 - interspeech2020.org

Accurate voiced/unvoiced information is crucial in estimating the pitch of a target speech
signal in severe nonstationary noise environments. Nevertheless, state-of-the-art pitch …

被引用次数：8 相关文章所有 6 个版本