Ml-superb: Multilingual speech universal performance benchmark

J Shi, D Berrebbi, W Chen, HL Chung, EP Hu… - arXiv preprint arXiv …, 2023 - arxiv.org
Speech processing Universal PERformance Benchmark (SUPERB) is a leaderboard to
benchmark the performance of Self-Supervised Learning (SSL) models on various speech …

Speech self-supervised representation benchmarking: Are we doing it right?

S Zaiem, Y Kemiche, T Parcollet, S Essid… - arXiv preprint arXiv …, 2023 - arxiv.org
Self-supervised learning (SSL) has recently allowed leveraging large datasets of unlabeled
speech signals to reach impressive performance on speech tasks using only small amounts …

CMOT: Cross-modal mixup via optimal transport for speech translation

Y Zhou, Q Fang, Y Feng - arXiv preprint arXiv:2305.14635, 2023 - arxiv.org
End-to-end speech translation (ST) is the task of translating speech signals in the source
language into text in the target language. As a cross-modal task, end-to-end ST is difficult to …

Speechgen: Unlocking the generative power of speech language models with prompts

H Wu, KW Chang, YK Wu, H Lee - arXiv preprint arXiv:2306.02207, 2023 - arxiv.org
Large language models (LLMs) have gained considerable attention for Artificial Intelligence
Generated Content (AIGC), particularly with the emergence of ChatGPT. However, the direct …

Speech self-supervised representations benchmarking: a case for larger probing heads

S Zaiem, Y Kemiche, T Parcollet, S Essid… - Computer Speech & …, 2025 - Elsevier
Self-supervised learning (SSL) leverages large datasets of unlabeled speech to reach
impressive performance with reduced amounts of annotated data. The high number of …

Findings of the 2023 ml-superb challenge: Pre-training and evaluation over more languages and beyond

J Shi, W Chen, D Berrebbi, HH Wang… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge
expands upon the acclaimed SUPERB framework, emphasizing self-supervised models in …

Ensemble knowledge distillation of self-supervised speech models

KP Huang, TH Feng, YK Fu, TY Hsu… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Distilled self-supervised models have shown competitive performance and efficiency in
recent years. However, there is a lack of experience in jointly distilling multiple self …

Av-superb: A multi-task evaluation benchmark for audio-visual representation models

Y Tseng, L Berry, YT Chen, IH Chiu… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Audio-visual representation learning aims to develop systems with human-like perception by
utilizing correlation between auditory and visual information. However, current models often …

Exploration on HuBERT with multiple resolutions

J Shi, Y Tang, H Inaguma, H Gong, J Pino… - arXiv preprint arXiv …, 2023 - arxiv.org
Hidden-unit BERT (HuBERT) is a widely-used self-supervised learning (SSL) model in
speech processing. However, we argue that its fixed 20ms resolution for hidden …

EMO-SUPERB: An in-depth look at speech emotion recognition

H Wu, HC Chou, KW Chang, L Goncalves, J Du… - arXiv preprint arXiv …, 2024 - arxiv.org
Speech emotion recognition (SER) is a pivotal technology for human-computer interaction
systems. However, 80.77% of SER papers yield results that cannot be reproduced. We …