Beyond just vision: A review on self-supervised representation learning on multimodal and temporal data
Recently, Self-Supervised Representation Learning (SSRL) has attracted much attention in
the field of computer vision, speech, natural language processing (NLP), and recently, with …
the field of computer vision, speech, natural language processing (NLP), and recently, with …
Speech self-supervised representation benchmarking: Are we doing it right?
Self-supervised learning (SSL) has recently allowed leveraging large datasets of unlabeled
speech signals to reach impressive performance on speech tasks using only small amounts …
speech signals to reach impressive performance on speech tasks using only small amounts …
Improved active multi-task representation learning via lasso
To leverage the copious amount of data from source tasks and overcome the scarcity of the
target task samples, representation learning based on multi-task pretraining has become a …
target task samples, representation learning based on multi-task pretraining has become a …
Speech self-supervised representations benchmarking: a case for larger probing heads
Self-supervised learning (SSL) leverages large datasets of unlabeled speech to reach
impressive performance with reduced amounts of annotated data. The high number of …
impressive performance with reduced amounts of annotated data. The high number of …
Losses can be blessings: Routing self-supervised speech representations towards efficient multilingual and multitask speech processing
Self-supervised learning (SSL) for rich speech representations has achieved empirical
success in low-resource Automatic Speech Recognition (ASR) and other speech processing …
success in low-resource Automatic Speech Recognition (ASR) and other speech processing …
Fine-tuning strategies for faster inference using speech self-supervised models: a comparative study
Self-supervised learning (SSL) has allowed substantial progress in Automatic Speech
Recognition (ASR) performance in low-resource settings. In this context, it has been …
Recognition (ASR) performance in low-resource settings. In this context, it has been …
[HTML][HTML] Creating musical features using multi-faceted, multi-task encoders based on transformers
Computational machine intelligence approaches have enabled a variety of music-centric
technologies in support of creating, sharing and interacting with music content. A strong …
technologies in support of creating, sharing and interacting with music content. A strong …
Benchmarking Representations for Speech, Music, and Acoustic Events
Limited diversity in standardized benchmarks for evaluating audio representation learning
(ARL) methods may hinder systematic comparison of current methods' capabilities. We …
(ARL) methods may hinder systematic comparison of current methods' capabilities. We …
Facial expression recognition based on zero-addition pretext training and feature conjunction-selection network in human-robot interaction
CS Jiang, ZT Liu, J She - IEEE Sensors Journal, 2023 - ieeexplore.ieee.org
The design of the feature extraction process and training strategy are crucial aspects of
achieving high-performance facial expression recognition (FER). Although the introduction …
achieving high-performance facial expression recognition (FER). Although the introduction …
Sound and visual representation learning with multiple pretraining tasks
AB Vasudevan, D Dai… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Different self-supervised tasks (SSL) reveal different features from the data. The learned
feature representations can exhibit different performance for each downstream task. In this …
feature representations can exhibit different performance for each downstream task. In this …