Improving automatic speech recognition performance for low-resource languages with self-supervise...

J Shi, D Berrebbi, W Chen, HL Chung, EP Hu… - arXiv preprint arXiv …, 2023 - arxiv.org

Speech processing Universal PERformance Benchmark (SUPERB) is a leaderboard to
benchmark the performance of Self-Supervised Learning (SSL) models on various speech …

被引用次数：55 相关文章所有 8 个版本

[PDF] arxiv.org

Whisper-SV: Adapting Whisper for low-data-resource speaker verification

L Zhang, N Jiang, Q Wang, Y Li, Q Lu, L Xie - Speech Communication, 2024 - Elsevier

Trained on 680,000 h of massive speech data, Whisper is a multitasking, multilingual
speech foundation model demonstrating superior performance in automatic speech …

被引用次数：5 相关文章所有 4 个版本

[PDF] arxiv.org

Comparison of multilingual self-supervised and weakly-supervised speech pre-training for adaptation to unseen languages

A Rouditchenko, S Khurana, S Thomas, R Feris… - arXiv preprint arXiv …, 2023 - arxiv.org

Recent models such as XLS-R and Whisper have made multilingual speech technologies
more accessible by pre-training on audio from around 100 spoken languages each …

被引用次数：16 相关文章所有 8 个版本

[PDF] arxiv.org

Akvsr: Audio knowledge empowered visual speech recognition by compressing audio knowledge of a pretrained model

JH Yeo, M Kim, J Choi, DH Kim… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Visual Speech Recognition (VSR) is the task of predicting spoken words from silent lip
movements. VSR is regarded as a challenging task because of the insufficient information …

被引用次数：15 相关文章所有 3 个版本

[PDF] arxiv.org

Speech self-supervised representations benchmarking: a case for larger probing heads

S Zaiem, Y Kemiche, T Parcollet, S Essid… - Computer Speech & …, 2025 - Elsevier

Self-supervised learning (SSL) leverages large datasets of unlabeled speech to reach
impressive performance with reduced amounts of annotated data. The high number of …

被引用次数：10 相关文章所有 6 个版本

[PDF] arxiv.org

Findings of the 2023 ml-superb challenge: Pre-training and evaluation over more languages and beyond

J Shi, W Chen, D Berrebbi, HH Wang… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge
expands upon the acclaimed SUPERB framework, emphasizing self-supervised models in …

被引用次数：8 相关文章所有 5 个版本

[PDF] arxiv.org

How to estimate model transferability of pre-trained speech models?

ZC Chen, CHH Yang, B Li, Y Zhang, N Chen… - arXiv preprint arXiv …, 2023 - arxiv.org

In this work, we introduce a" score-based assessment" framework for estimating the
transferability of pre-trained speech models (PSMs) for fine-tuning target tasks. We leverage …

被引用次数：10 相关文章所有 4 个版本

[PDF] google.com

A semi-supervised complementary joint training approach for low-resource speech recognition

YQ Du, J Zhang, X Fang, MH Wu… - IEEE/ACM Transactions …, 2023 - ieeexplore.ieee.org

Both unpaired speech and text have shown to be beneficial for low-resource automatic
speech recognition (ASR), which, however were either separately used for pre-training, self …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

EFFUSE: Efficient self-supervised feature fusion for E2E ASR in multilingual and low resource scenarios

T Srivastava, J Shi, W Chen, S Watanabe - arXiv preprint arXiv …, 2023 - arxiv.org

Self-Supervised Learning (SSL) models have demonstrated exceptional performance in
various speech tasks, particularly in low-resource and multilingual domains. Recent works …

被引用次数：7 相关文章所有 2 个版本

[PDF] arxiv.org

Task-agnostic structured pruning of speech representation models

H Wang, S Wang, WQ Zhang, H Suo, Y Wan - arXiv preprint arXiv …, 2023 - arxiv.org

Self-supervised pre-trained models such as Wav2vec2, Hubert, and WavLM have been
shown to significantly improve many speech tasks. However, their large memory and strong …

被引用次数：12 相关文章所有 4 个版本