Self-supervised learning for videos: A survey

MC Schiappa, YS Rawat, M Shah - ACM Computing Surveys, 2023 - dl.acm.org
The remarkable success of deep learning in various domains relies on the availability of
large-scale annotated datasets. However, obtaining annotations is expensive and requires …

Sound event detection: A tutorial

A Mesaros, T Heittola, T Virtanen… - IEEE Signal …, 2021 - ieeexplore.ieee.org
Imagine standing on a street corner in the city. With your eyes closed you can hear and
recognize a succession of sounds: cars passing by, people speaking, their footsteps when …

Fsd50k: an open dataset of human-labeled sound events

E Fonseca, X Favory, J Pons, F Font… - IEEE/ACM Transactions …, 2021 - ieeexplore.ieee.org
Most existing datasets for sound event recognition (SER) are relatively small and/or domain-
specific, with the exception of AudioSet, based on over 2 M tracks from YouTube videos and …

Panns: Large-scale pretrained audio neural networks for audio pattern recognition

Q Kong, Y Cao, T Iqbal, Y Wang… - … on Audio, Speech …, 2020 - ieeexplore.ieee.org
Audio pattern recognition is an important research topic in the machine learning area, and
includes several tasks such as audio tagging, acoustic scene classification, music …

Listen, think, and understand

Y Gong, H Luo, AH Liu, L Karlinsky, J Glass - arXiv preprint arXiv …, 2023 - arxiv.org
The ability of artificial intelligence (AI) systems to perceive and comprehend audio signals is
crucial for many applications. Although significant progress has been made in this area …

Vggsound: A large-scale audio-visual dataset

H Chen, W Xie, A Vedaldi… - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org
Our goal is to collect a large-scale audio-visual dataset with low label noise from videosin
the wild'using computer vision techniques. The resulting dataset can be used for training …

[HTML][HTML] Machine learning in acoustics: Theory and applications

MJ Bianco, P Gerstoft, J Traer, E Ozanich… - The Journal of the …, 2019 - pubs.aip.org
Acoustic data provide scientific and engineering insights in fields ranging from biology and
communications to ocean and Earth science. We survey the recent advances and …

Audio set: An ontology and human-labeled dataset for audio events

JF Gemmeke, DPW Ellis, D Freedman… - … on acoustics, speech …, 2017 - ieeexplore.ieee.org
Audio event recognition, the human-like ability to identify and relate sounds from audio, is a
nascent problem in machine perception. Comparable problems such as object detection in …

Deep transfer learning for automatic speech recognition: Towards better generalization

H Kheddar, Y Himeur, S Al-Maadeed, A Amira… - Knowledge-Based …, 2023 - Elsevier
Automatic speech recognition (ASR) has recently become an important challenge when
using deep learning (DL). It requires large-scale training datasets and high computational …

CNN architectures for large-scale audio classification

S Hershey, S Chaudhuri, DPW Ellis… - … on acoustics, speech …, 2017 - ieeexplore.ieee.org
Convolutional Neural Networks (CNNs) have proven very effective in image classification
and show promise for audio. We use various CNN architectures to classify the soundtracks …