Ml-superb: Multilingual speech universal performance benchmark
Speech processing Universal PERformance Benchmark (SUPERB) is a leaderboard to
benchmark the performance of Self-Supervised Learning (SSL) models on various speech …
benchmark the performance of Self-Supervised Learning (SSL) models on various speech …
Speech self-supervised representation benchmarking: Are we doing it right?
Self-supervised learning (SSL) has recently allowed leveraging large datasets of unlabeled
speech signals to reach impressive performance on speech tasks using only small amounts …
speech signals to reach impressive performance on speech tasks using only small amounts …
CMOT: Cross-modal mixup via optimal transport for speech translation
End-to-end speech translation (ST) is the task of translating speech signals in the source
language into text in the target language. As a cross-modal task, end-to-end ST is difficult to …
language into text in the target language. As a cross-modal task, end-to-end ST is difficult to …
Speechgen: Unlocking the generative power of speech language models with prompts
Large language models (LLMs) have gained considerable attention for Artificial Intelligence
Generated Content (AIGC), particularly with the emergence of ChatGPT. However, the direct …
Generated Content (AIGC), particularly with the emergence of ChatGPT. However, the direct …
Speech self-supervised representations benchmarking: a case for larger probing heads
Self-supervised learning (SSL) leverages large datasets of unlabeled speech to reach
impressive performance with reduced amounts of annotated data. The high number of …
impressive performance with reduced amounts of annotated data. The high number of …
Findings of the 2023 ml-superb challenge: Pre-training and evaluation over more languages and beyond
The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge
expands upon the acclaimed SUPERB framework, emphasizing self-supervised models in …
expands upon the acclaimed SUPERB framework, emphasizing self-supervised models in …
Ensemble knowledge distillation of self-supervised speech models
Distilled self-supervised models have shown competitive performance and efficiency in
recent years. However, there is a lack of experience in jointly distilling multiple self …
recent years. However, there is a lack of experience in jointly distilling multiple self …
Av-superb: A multi-task evaluation benchmark for audio-visual representation models
Audio-visual representation learning aims to develop systems with human-like perception by
utilizing correlation between auditory and visual information. However, current models often …
utilizing correlation between auditory and visual information. However, current models often …
Exploration on HuBERT with multiple resolutions
Hidden-unit BERT (HuBERT) is a widely-used self-supervised learning (SSL) model in
speech processing. However, we argue that its fixed 20ms resolution for hidden …
speech processing. However, we argue that its fixed 20ms resolution for hidden …
EMO-SUPERB: An in-depth look at speech emotion recognition
Speech emotion recognition (SER) is a pivotal technology for human-computer interaction
systems. However, 80.77% of SER papers yield results that cannot be reproduced. We …
systems. However, 80.77% of SER papers yield results that cannot be reproduced. We …