Fsd50k: an open dataset of human-labeled sound events

E Fonseca, X Favory, J Pons, F Font… - IEEE/ACM Transactions …, 2021 - ieeexplore.ieee.org
Most existing datasets for sound event recognition (SER) are relatively small and/or domain-
specific, with the exception of AudioSet, based on over 2 M tracks from YouTube videos and …

Audio retrieval with natural language queries: A benchmark study

AS Koepke, AM Oncescu, JF Henriques… - IEEE Transactions …, 2022 - ieeexplore.ieee.org
The objectives of this work are cross-modal text-audio and audio-text retrieval, in which the
goal is to retrieve the audio content from a pool of candidates that best matches a given …

The benefit of temporally-strong labels in audio event classification

S Hershey, DPW Ellis, E Fonseca… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
To reveal the importance of temporal precision in ground truth audio event labels, we
collected precise (∼ 0.1 sec resolution)" strong" labels for a portion of the AudioSet dataset …

A survey on preprocessing and classification techniques for acoustic scene

VK Singh, K Sharma, SN Sur - Expert Systems with Applications, 2023 - Elsevier
There are lots of research papers for ASC, and in recent years it is rapidly increasing.
DCASE also provides different types of competition for the submission of several papers to …

Audio retrieval with natural language queries

AM Oncescu, A Koepke, JF Henriques, Z Akata… - arXiv preprint arXiv …, 2021 - arxiv.org
We consider the task of retrieving audio using free-form natural language queries. To study
this problem, which has received limited attention in the existing literature, we introduce …

Unsupervised contrastive learning of sound event representations

E Fonseca, D Ortego, K McGuinness… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
Self-supervised representation learning can mitigate the limitations in recognition tasks with
few manually labeled data but abundant unlabeled data—a common scenario in sound …

Underwater acoustic target recognition based on depthwise separable convolution neural networks

G Hu, K Wang, L Liu - Sensors, 2021 - mdpi.com
Facing the complex marine environment, it is extremely challenging to conduct underwater
acoustic target feature extraction and recognition using ship-radiated noise. In this paper …

Receptive field regularization techniques for audio classification and tagging with deep convolutional neural networks

K Koutini, H Eghbal-zadeh… - IEEE/ACM Transactions …, 2021 - ieeexplore.ieee.org
In this paper, we study the performance of variants of well-known Convolutional Neural
Network (CNN) architectures on different audio tasks. We show that tuning the Receptive …

Improving bird classification with unsupervised sound separation

T Denton, S Wisdom, JR Hershey - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
This paper addresses the problem of species classification in bird song recordings. The
massive amount of available field recordings of birds presents an opportunity to use …

Cross-task learning for audio tagging, sound event detection and spatial localization: DCASE 2019 baseline systems

Q Kong, Y Cao, T Iqbal, Y Xu, W Wang… - arXiv preprint arXiv …, 2019 - arxiv.org
The Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 challenge
focuses on audio tagging, sound event detection and spatial localisation. DCASE 2019 …