Masked modeling duo: Learning representations by encouraging both networks to model the input

D Niizumi, D Takeuchi, Y Ohishi… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Masked Autoencoders is a simple yet powerful self-supervised learning method. However, it
learns representations indirectly by reconstructing masked input patches. Several methods …

Benchmarking Representations for Speech, Music, and Acoustic Events

M La Quatra, A Koudounas, L Vaiani, E Baralis… - arXiv preprint arXiv …, 2024 - arxiv.org
Limited diversity in standardized benchmarks for evaluating audio representation learning
(ARL) methods may hinder systematic comparison of current methods' capabilities. We …

Noise robust distillation of self-supervised speech models via correlation metrics

F Ritter-Gutierrez, KP Huang, D Ng… - … , Speech, and Signal …, 2024 - ieeexplore.ieee.org
Compared to large speech foundation models, small student models exhibit degraded noise
robustness. The student's robustness can be improved by introducing noise at the inputs …

Masked Modeling Duo: Towards a Universal Audio Pre-Training Framework

D Niizumi, D Takeuchi, Y Ohishi… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org
Self-supervised learning (SSL) using masked prediction has made great strides in general-
purpose audio representation. This study proposes Masked Modeling Duo (M2D), an …

[PDF][PDF] LABERT: A Combination of Local Aggregation and Self-Supervised Speech Representation Learning for Detecting Informative Hidden Units in Low-Resource …

K Fatehi, A Kucukyilmaz - Proc. INTERSPEECH, 2023 - isca-archive.org
With advances in deep learning methodologies, Automatic Speech Recognition (ASR)
systems have seen impressive results. However, ASR in Low-Resource Environments …

MAST: Multiscale audio spectrogram transformers

S Ghosh, A Seth, S Umesh… - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
We present Multiscale Audio Spectrogram Transformer (MAST) for audio classification,
which brings the concept of multiscale feature hierarchies to the Audio Spectrogram …

Classification and Retrieval of Multimedia Audio Learning Resources

W Zhang - International Journal of Emerging Technologies in …, 2023 - learntechlib.org
With the development of the Internet and new media, multimedia and audio learning
resources have been widely used in teaching and learning. However, their classification and …

SLICER: Learning universal audio representations using low-resource self-supervised pre-training

A Seth, S Ghosh, S Umesh… - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
We present a new Self-Supervised Learning (SSL) approach to pre-train encoders on
unlabeled audio data that reduces the need for large amounts of labeled data for audio and …

FusDom: Combining in-Domain and Out-of-Domain Knowledge for Continuous Self-Supervised Learning

A Seth, S Ghosh, S Umesh… - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
Continued pre-training (CP) offers multiple advantages, like target domain adaptation and
the potential to exploit the continuous stream of unlabeled data available online. However …

WaveBYOL: Self-Supervised Learning for Audio Representation From Raw Waveforms

S Kim, YH Choi - IEEE Access, 2023 - ieeexplore.ieee.org
In this paper, we propose the WaveBYOL model, which can learn general-purpose audio
representations directly from raw waveforms based on the bootstrap your own latent (BYOL) …