Beyond just vision: A review on self-supervised representation learning on multimodal and temporal data

S Deldari, H Xue, A Saeed, J He, DV Smith… - arXiv preprint arXiv …, 2022 - arxiv.org
Recently, Self-Supervised Representation Learning (SSRL) has attracted much attention in
the field of computer vision, speech, natural language processing (NLP), and recently, with …

Speech self-supervised representation benchmarking: Are we doing it right?

S Zaiem, Y Kemiche, T Parcollet, S Essid… - arXiv preprint arXiv …, 2023 - arxiv.org
Self-supervised learning (SSL) has recently allowed leveraging large datasets of unlabeled
speech signals to reach impressive performance on speech tasks using only small amounts …

Improved active multi-task representation learning via lasso

Y Wang, Y Chen, K Jamieson… - … Conference on Machine …, 2023 - proceedings.mlr.press
To leverage the copious amount of data from source tasks and overcome the scarcity of the
target task samples, representation learning based on multi-task pretraining has become a …

Speech self-supervised representations benchmarking: a case for larger probing heads

S Zaiem, Y Kemiche, T Parcollet, S Essid… - arXiv preprint arXiv …, 2023 - arxiv.org
Self-supervised learning (SSL) leverages large datasets of unlabeled speech to reach
impressive performance with reduced amounts of annotated data. The high number of …

Losses can be blessings: Routing self-supervised speech representations towards efficient multilingual and multitask speech processing

Y Fu, Y Zhang, K Qian, Z Ye, Z Yu… - Advances in Neural …, 2022 - proceedings.neurips.cc
Self-supervised learning (SSL) for rich speech representations has achieved empirical
success in low-resource Automatic Speech Recognition (ASR) and other speech processing …

Fine-tuning strategies for faster inference using speech self-supervised models: a comparative study

S Zaiem, R Algayres, T Parcollet… - … , Speech, and Signal …, 2023 - ieeexplore.ieee.org
Self-supervised learning (SSL) has allowed substantial progress in Automatic Speech
Recognition (ASR) performance in low-resource settings. In this context, it has been …

[HTML][HTML] Creating musical features using multi-faceted, multi-task encoders based on transformers

T Greer, X Shi, B Ma, S Narayanan - Scientific Reports, 2023 - nature.com
Computational machine intelligence approaches have enabled a variety of music-centric
technologies in support of creating, sharing and interacting with music content. A strong …

Benchmarking Representations for Speech, Music, and Acoustic Events

M La Quatra, A Koudounas, L Vaiani, E Baralis… - arXiv preprint arXiv …, 2024 - arxiv.org
Limited diversity in standardized benchmarks for evaluating audio representation learning
(ARL) methods may hinder systematic comparison of current methods' capabilities. We …

Facial expression recognition based on zero-addition pretext training and feature conjunction-selection network in human-robot interaction

CS Jiang, ZT Liu, J She - IEEE Sensors Journal, 2023 - ieeexplore.ieee.org
The design of the feature extraction process and training strategy are crucial aspects of
achieving high-performance facial expression recognition (FER). Although the introduction …

Sound and visual representation learning with multiple pretraining tasks

AB Vasudevan, D Dai… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Different self-supervised tasks (SSL) reveal different features from the data. The learned
feature representations can exhibit different performance for each downstream task. In this …