Masked modeling duo: Learning representations by encouraging both networks to model the input
D Niizumi, D Takeuchi, Y Ohishi… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Masked Autoencoders is a simple yet powerful self-supervised learning method. However, it
learns representations indirectly by reconstructing masked input patches. Several methods …
learns representations indirectly by reconstructing masked input patches. Several methods …
Benchmarking Representations for Speech, Music, and Acoustic Events
Limited diversity in standardized benchmarks for evaluating audio representation learning
(ARL) methods may hinder systematic comparison of current methods' capabilities. We …
(ARL) methods may hinder systematic comparison of current methods' capabilities. We …
Noise robust distillation of self-supervised speech models via correlation metrics
Compared to large speech foundation models, small student models exhibit degraded noise
robustness. The student's robustness can be improved by introducing noise at the inputs …
robustness. The student's robustness can be improved by introducing noise at the inputs …
Masked Modeling Duo: Towards a Universal Audio Pre-Training Framework
D Niizumi, D Takeuchi, Y Ohishi… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org
Self-supervised learning (SSL) using masked prediction has made great strides in general-
purpose audio representation. This study proposes Masked Modeling Duo (M2D), an …
purpose audio representation. This study proposes Masked Modeling Duo (M2D), an …
[PDF][PDF] LABERT: A Combination of Local Aggregation and Self-Supervised Speech Representation Learning for Detecting Informative Hidden Units in Low-Resource …
K Fatehi, A Kucukyilmaz - Proc. INTERSPEECH, 2023 - isca-archive.org
With advances in deep learning methodologies, Automatic Speech Recognition (ASR)
systems have seen impressive results. However, ASR in Low-Resource Environments …
systems have seen impressive results. However, ASR in Low-Resource Environments …
MAST: Multiscale audio spectrogram transformers
We present Multiscale Audio Spectrogram Transformer (MAST) for audio classification,
which brings the concept of multiscale feature hierarchies to the Audio Spectrogram …
which brings the concept of multiscale feature hierarchies to the Audio Spectrogram …
Classification and Retrieval of Multimedia Audio Learning Resources
W Zhang - International Journal of Emerging Technologies in …, 2023 - learntechlib.org
With the development of the Internet and new media, multimedia and audio learning
resources have been widely used in teaching and learning. However, their classification and …
resources have been widely used in teaching and learning. However, their classification and …
SLICER: Learning universal audio representations using low-resource self-supervised pre-training
We present a new Self-Supervised Learning (SSL) approach to pre-train encoders on
unlabeled audio data that reduces the need for large amounts of labeled data for audio and …
unlabeled audio data that reduces the need for large amounts of labeled data for audio and …
FusDom: Combining in-Domain and Out-of-Domain Knowledge for Continuous Self-Supervised Learning
Continued pre-training (CP) offers multiple advantages, like target domain adaptation and
the potential to exploit the continuous stream of unlabeled data available online. However …
the potential to exploit the continuous stream of unlabeled data available online. However …
WaveBYOL: Self-Supervised Learning for Audio Representation From Raw Waveforms
S Kim, YH Choi - IEEE Access, 2023 - ieeexplore.ieee.org
In this paper, we propose the WaveBYOL model, which can learn general-purpose audio
representations directly from raw waveforms based on the bootstrap your own latent (BYOL) …
representations directly from raw waveforms based on the bootstrap your own latent (BYOL) …