Ml-superb: Multilingual speech universal performance benchmark

J Shi, D Berrebbi, W Chen, HL Chung, EP Hu… - arXiv preprint arXiv …, 2023 - arxiv.org
Speech processing Universal PERformance Benchmark (SUPERB) is a leaderboard to
benchmark the performance of Self-Supervised Learning (SSL) models on various speech …

Whisper-SV: Adapting Whisper for low-data-resource speaker verification

L Zhang, N Jiang, Q Wang, Y Li, Q Lu, L Xie - Speech Communication, 2024 - Elsevier
Trained on 680,000 h of massive speech data, Whisper is a multitasking, multilingual
speech foundation model demonstrating superior performance in automatic speech …

Comparison of multilingual self-supervised and weakly-supervised speech pre-training for adaptation to unseen languages

A Rouditchenko, S Khurana, S Thomas, R Feris… - arXiv preprint arXiv …, 2023 - arxiv.org
Recent models such as XLS-R and Whisper have made multilingual speech technologies
more accessible by pre-training on audio from around 100 spoken languages each …

Akvsr: Audio knowledge empowered visual speech recognition by compressing audio knowledge of a pretrained model

JH Yeo, M Kim, J Choi, DH Kim… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Visual Speech Recognition (VSR) is the task of predicting spoken words from silent lip
movements. VSR is regarded as a challenging task because of the insufficient information …

Speech self-supervised representations benchmarking: a case for larger probing heads

S Zaiem, Y Kemiche, T Parcollet, S Essid… - Computer Speech & …, 2025 - Elsevier
Self-supervised learning (SSL) leverages large datasets of unlabeled speech to reach
impressive performance with reduced amounts of annotated data. The high number of …

Findings of the 2023 ml-superb challenge: Pre-training and evaluation over more languages and beyond

J Shi, W Chen, D Berrebbi, HH Wang… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge
expands upon the acclaimed SUPERB framework, emphasizing self-supervised models in …

How to estimate model transferability of pre-trained speech models?

ZC Chen, CHH Yang, B Li, Y Zhang, N Chen… - arXiv preprint arXiv …, 2023 - arxiv.org
In this work, we introduce a" score-based assessment" framework for estimating the
transferability of pre-trained speech models (PSMs) for fine-tuning target tasks. We leverage …

A semi-supervised complementary joint training approach for low-resource speech recognition

YQ Du, J Zhang, X Fang, MH Wu… - IEEE/ACM Transactions …, 2023 - ieeexplore.ieee.org
Both unpaired speech and text have shown to be beneficial for low-resource automatic
speech recognition (ASR), which, however were either separately used for pre-training, self …

EFFUSE: Efficient self-supervised feature fusion for E2E ASR in multilingual and low resource scenarios

T Srivastava, J Shi, W Chen, S Watanabe - arXiv preprint arXiv …, 2023 - arxiv.org
Self-Supervised Learning (SSL) models have demonstrated exceptional performance in
various speech tasks, particularly in low-resource and multilingual domains. Recent works …

Task-agnostic structured pruning of speech representation models

H Wang, S Wang, WQ Zhang, H Suo, Y Wan - arXiv preprint arXiv …, 2023 - arxiv.org
Self-supervised pre-trained models such as Wav2vec2, Hubert, and WavLM have been
shown to significantly improve many speech tasks. However, their large memory and strong …