nnaudio: An on-the-fly gpu audio to spectrogram conversion toolbox using 1d convolutional neural networks

KW Cheuk, H Anderson, K Agres, D Herremans - IEEE Access, 2020 - ieeexplore.ieee.org
In this paper, we present nnAudio, a new neural network-based audio processing framework
with graphics processing unit (GPU) support that leverages 1D convolutional neural …

Fully-convolutional network for pitch estimation of speech signals

L Ardaillon, A Roebel - Insterspeech 2019, 2019 - hal.science
The estimation of fundamental frequency (F0) from audio is a necessary step in many
speech processing tasks such as speech synthesis, that require to accurately analyze big …

Cover detection using dominant melody embeddings

G Doras, G Peeters - arXiv preprint arXiv:1907.01824, 2019 - arxiv.org
Automatic cover detection--the task of finding in an audio database all the covers of one or
several query tracks--has long been seen as a challenging theoretical problem in the MIR …

Comparing deep models and evaluation strategies for multi-pitch estimation in music recordings

C Weiß, G Peeters - IEEE/ACM Transactions on Audio, Speech …, 2022 - ieeexplore.ieee.org
Extracting pitch information from music recordings is a challenging but important problem in
music signal processing. Frame-wise transcription or multi-pitch estimation aims for …

Jazz bass transcription using a U-net architecture

J Abeßer, M Müller - Electronics, 2021 - mdpi.com
In this paper, we adapt a recently proposed U-net deep neural network architecture from
melody to bass transcription. We investigate pitch shifting and random equalization as data …

[PDF][PDF] Creating DALI, a Large Dataset of Synchronized Audio, Lyrics, and Notes.

G Meseguer-Brocal… - Trans. Int. Soc. Music …, 2020 - pdfs.semanticscholar.org
The DALI dataset is a large dataset of time-aligned symbolic vocal melody notations (notes)
and lyrics at four levels of granularity. DALI contains 5358 songs in its first version and 7756 …

A prototypical triplet loss for cover detection

G Doras, G Peeters - ICASSP 2020-2020 IEEE International …, 2020 - ieeexplore.ieee.org
Automatic cover detection-the task of finding in an audio dataset all covers of a query track-
has long been a challenging theoretical problem in MIR community. It also became a …

Combining musical features for cover detection

G Doras, F Yesiler, J Serrà Julià… - … J, Ha Lee J, McFee B …, 2020 - repositori.upf.edu
Recent works have addressed the automatic cover detection problem from a metric learning
perspective. They employ different input representations, aiming to exploit melodic or …

[PDF][PDF] Multi-pitch Estimation meets Microphone Mismatch: Applicability of Domain Adaptation.

F Bittner, M Gonzalez, ML Richter, HM Lukashevich… - ISMIR, 2022 - archives.ismir.net
The performance of machine learning (ML) models is known to be affected by discrepancies
between training (source) and real-world (target) data distributions. This problem is referred …

[PDF][PDF] Robust Pitch Regression with Voiced/Unvoiced Classification in Nonstationary Noise Environments.

DN Tran, U Batricevic, K Koishida - INTERSPEECH, 2020 - interspeech2020.org
Accurate voiced/unvoiced information is crucial in estimating the pitch of a target speech
signal in severe nonstationary noise environments. Nevertheless, state-of-the-art pitch …