Ml-superb: Multilingual speech universal performance benchmark
Speech processing Universal PERformance Benchmark (SUPERB) is a leaderboard to
benchmark the performance of Self-Supervised Learning (SSL) models on various speech …
benchmark the performance of Self-Supervised Learning (SSL) models on various speech …
Whisper-SV: Adapting Whisper for low-data-resource speaker verification
Trained on 680,000 h of massive speech data, Whisper is a multitasking, multilingual
speech foundation model demonstrating superior performance in automatic speech …
speech foundation model demonstrating superior performance in automatic speech …
Comparison of multilingual self-supervised and weakly-supervised speech pre-training for adaptation to unseen languages
Recent models such as XLS-R and Whisper have made multilingual speech technologies
more accessible by pre-training on audio from around 100 spoken languages each …
more accessible by pre-training on audio from around 100 spoken languages each …
Akvsr: Audio knowledge empowered visual speech recognition by compressing audio knowledge of a pretrained model
Visual Speech Recognition (VSR) is the task of predicting spoken words from silent lip
movements. VSR is regarded as a challenging task because of the insufficient information …
movements. VSR is regarded as a challenging task because of the insufficient information …
Speech self-supervised representations benchmarking: a case for larger probing heads
Self-supervised learning (SSL) leverages large datasets of unlabeled speech to reach
impressive performance with reduced amounts of annotated data. The high number of …
impressive performance with reduced amounts of annotated data. The high number of …
Findings of the 2023 ml-superb challenge: Pre-training and evaluation over more languages and beyond
The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge
expands upon the acclaimed SUPERB framework, emphasizing self-supervised models in …
expands upon the acclaimed SUPERB framework, emphasizing self-supervised models in …
How to estimate model transferability of pre-trained speech models?
In this work, we introduce a" score-based assessment" framework for estimating the
transferability of pre-trained speech models (PSMs) for fine-tuning target tasks. We leverage …
transferability of pre-trained speech models (PSMs) for fine-tuning target tasks. We leverage …
A semi-supervised complementary joint training approach for low-resource speech recognition
Both unpaired speech and text have shown to be beneficial for low-resource automatic
speech recognition (ASR), which, however were either separately used for pre-training, self …
speech recognition (ASR), which, however were either separately used for pre-training, self …
EFFUSE: Efficient self-supervised feature fusion for E2E ASR in multilingual and low resource scenarios
Self-Supervised Learning (SSL) models have demonstrated exceptional performance in
various speech tasks, particularly in low-resource and multilingual domains. Recent works …
various speech tasks, particularly in low-resource and multilingual domains. Recent works …
Task-agnostic structured pruning of speech representation models
H Wang, S Wang, WQ Zhang, H Suo, Y Wan - arXiv preprint arXiv …, 2023 - arxiv.org
Self-supervised pre-trained models such as Wav2vec2, Hubert, and WavLM have been
shown to significantly improve many speech tasks. However, their large memory and strong …
shown to significantly improve many speech tasks. However, their large memory and strong …