Masked modeling duo: Learning representations by encouraging both networks to model the input
D Niizumi, D Takeuchi, Y Ohishi… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Masked Autoencoders is a simple yet powerful self-supervised learning method. However, it
learns representations indirectly by reconstructing masked input patches. Several methods …
learns representations indirectly by reconstructing masked input patches. Several methods …
Benchmarking Representations for Speech, Music, and Acoustic Events
Limited diversity in standardized benchmarks for evaluating audio representation learning
(ARL) methods may hinder systematic comparison of current methods' capabilities. We …
(ARL) methods may hinder systematic comparison of current methods' capabilities. We …
Masked Modeling Duo: Towards a Universal Audio Pre-Training Framework
D Niizumi, D Takeuchi, Y Ohishi… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org
Self-supervised learning (SSL) using masked prediction has made great strides in general-
purpose audio representation. This study proposes Masked Modeling Duo (M2D), an …
purpose audio representation. This study proposes Masked Modeling Duo (M2D), an …
MAST: Multiscale audio spectrogram transformers
We present Multiscale Audio Spectrogram Transformer (MAST) for audio classification,
which brings the concept of multiscale feature hierarchies to the Audio Spectrogram …
which brings the concept of multiscale feature hierarchies to the Audio Spectrogram …
Classification and Retrieval of Multimedia Audio Learning Resources
W Zhang - International Journal of Emerging Technologies in …, 2023 - learntechlib.org
With the development of the Internet and new media, multimedia and audio learning
resources have been widely used in teaching and learning. However, their classification and …
resources have been widely used in teaching and learning. However, their classification and …
SLICER: Learning universal audio representations using low-resource self-supervised pre-training
We present a new Self-Supervised Learning (SSL) approach to pre-train encoders on
unlabeled audio data that reduces the need for large amounts of labeled data for audio and …
unlabeled audio data that reduces the need for large amounts of labeled data for audio and …
FusDom: Combining in-Domain and Out-of-Domain Knowledge for Continuous Self-Supervised Learning
Continued pre-training (CP) offers multiple advantages, like target domain adaptation and
the potential to exploit the continuous stream of unlabeled data available online. However …
the potential to exploit the continuous stream of unlabeled data available online. However …
[PDF][PDF] TOWARDS LEARNING A DIFFERENCE-AWARE GENERAL-PURPOSE AUDIO REPRESENTATION
General-purpose audio representations with self-supervised learning have shown promising
results on diverse tasks. Methods such as BYOL-A try to learn semantically robust …
results on diverse tasks. Methods such as BYOL-A try to learn semantically robust …