Audio tagging with noisy labels and minimal supervision

E Fonseca, X Favory, J Pons, F Font… - IEEE/ACM Transactions …, 2021 - ieeexplore.ieee.org

Most existing datasets for sound event recognition (SER) are relatively small and/or domain-
specific, with the exception of AudioSet, based on over 2 M tracks from YouTube videos and …

被引用次数：422 相关文章所有 5 个版本

[PDF] arxiv.org

Audio retrieval with natural language queries: A benchmark study

AS Koepke, AM Oncescu, JF Henriques… - IEEE Transactions …, 2022 - ieeexplore.ieee.org

The objectives of this work are cross-modal text-audio and audio-text retrieval, in which the
goal is to retrieve the audio content from a pool of candidates that best matches a given …

被引用次数：93 相关文章所有 10 个版本

[PDF] arxiv.org

The benefit of temporally-strong labels in audio event classification

S Hershey, DPW Ellis, E Fonseca… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

To reveal the importance of temporal precision in ground truth audio event labels, we
collected precise (∼ 0.1 sec resolution)" strong" labels for a portion of the AudioSet dataset …

被引用次数：101 相关文章所有 6 个版本

A survey on preprocessing and classification techniques for acoustic scene

VK Singh, K Sharma, SN Sur - Expert Systems with Applications, 2023 - Elsevier

There are lots of research papers for ASC, and in recent years it is rapidly increasing.
DCASE also provides different types of competition for the submission of several papers to …

被引用次数：7 相关文章所有 2 个版本

[PDF] arxiv.org

Audio retrieval with natural language queries

AM Oncescu, A Koepke, JF Henriques, Z Akata… - arXiv preprint arXiv …, 2021 - arxiv.org

We consider the task of retrieving audio using free-form natural language queries. To study
this problem, which has received limited attention in the existing literature, we introduce …

被引用次数：80 相关文章所有 13 个版本

[PDF] arxiv.org

Unsupervised contrastive learning of sound event representations

E Fonseca, D Ortego, K McGuinness… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

Self-supervised representation learning can mitigate the limitations in recognition tasks with
few manually labeled data but abundant unlabeled data—a common scenario in sound …

被引用次数：72 相关文章所有 8 个版本

[PDF] mdpi.com

Underwater acoustic target recognition based on depthwise separable convolution neural networks

G Hu, K Wang, L Liu - Sensors, 2021 - mdpi.com

Facing the complex marine environment, it is extremely challenging to conduct underwater
acoustic target feature extraction and recognition using ship-radiated noise. In this paper …

被引用次数：60 相关文章所有 9 个版本

[PDF] arxiv.org

Receptive field regularization techniques for audio classification and tagging with deep convolutional neural networks

K Koutini, H Eghbal-zadeh… - IEEE/ACM Transactions …, 2021 - ieeexplore.ieee.org

In this paper, we study the performance of variants of well-known Convolutional Neural
Network (CNN) architectures on different audio tasks. We show that tuning the Receptive …

被引用次数：52 相关文章所有 4 个版本

[PDF] arxiv.org

Improving bird classification with unsupervised sound separation

T Denton, S Wisdom, JR Hershey - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org

This paper addresses the problem of species classification in bird song recordings. The
massive amount of available field recordings of birds presents an opportunity to use …

被引用次数：51 相关文章所有 5 个版本

[PDF] arxiv.org

Cross-task learning for audio tagging, sound event detection and spatial localization: DCASE 2019 baseline systems

Q Kong, Y Cao, T Iqbal, Y Xu, W Wang… - arXiv preprint arXiv …, 2019 - arxiv.org

The Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 challenge
focuses on audio tagging, sound event detection and spatial localisation. DCASE 2019 …

被引用次数：99 相关文章所有 6 个版本