Decorrelating feature spaces for learning general-purpose audio representations

D Niizumi, D Takeuchi, Y Ohishi… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Masked Autoencoders is a simple yet powerful self-supervised learning method. However, it
learns representations indirectly by reconstructing masked input patches. Several methods …

被引用次数：27 相关文章所有 5 个版本

[PDF] arxiv.org

Benchmarking Representations for Speech, Music, and Acoustic Events

M La Quatra, A Koudounas, L Vaiani, E Baralis… - arXiv preprint arXiv …, 2024 - arxiv.org

Limited diversity in standardized benchmarks for evaluating audio representation learning
(ARL) methods may hinder systematic comparison of current methods' capabilities. We …

被引用次数：7 相关文章所有 2 个版本

[PDF] ieee.org

Masked Modeling Duo: Towards a Universal Audio Pre-Training Framework

D Niizumi, D Takeuchi, Y Ohishi… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org

Self-supervised learning (SSL) using masked prediction has made great strides in general-
purpose audio representation. This study proposes Masked Modeling Duo (M2D), an …

被引用次数：5 相关文章所有 3 个版本

[PDF] arxiv.org

MAST: Multiscale audio spectrogram transformers

S Ghosh, A Seth, S Umesh… - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org

We present Multiscale Audio Spectrogram Transformer (MAST) for audio classification,
which brings the concept of multiscale feature hierarchies to the Audio Spectrogram …

被引用次数：3 相关文章所有 3 个版本

[PDF] learntechlib.org

Classification and Retrieval of Multimedia Audio Learning Resources

W Zhang - International Journal of Emerging Technologies in …, 2023 - learntechlib.org

With the development of the Internet and new media, multimedia and audio learning
resources have been widely used in teaching and learning. However, their classification and …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

SLICER: Learning universal audio representations using low-resource self-supervised pre-training

A Seth, S Ghosh, S Umesh… - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org

We present a new Self-Supervised Learning (SSL) approach to pre-train encoders on
unlabeled audio data that reduces the need for large amounts of labeled data for audio and …

被引用次数：2 相关文章所有 4 个版本

[PDF] arxiv.org

FusDom: Combining in-Domain and Out-of-Domain Knowledge for Continuous Self-Supervised Learning

A Seth, S Ghosh, S Umesh… - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org

Continued pre-training (CP) offers multiple advantages, like target domain adaptation and
the potential to exploit the continuous stream of unlabeled data available online. However …

[PDF][PDF] TOWARDS LEARNING A DIFFERENCE-AWARE GENERAL-PURPOSE AUDIO REPRESENTATION

D Takeuchi, M Yasuda, D Niizumi, N Harada - dcase.community

General-purpose audio representations with self-supervised learning have shown promising
results on diverse tasks. Methods such as BYOL-A try to learn semantically robust …