Scaling speech technology to 1,000+ languages

V Pratap, A Tjandra, B Shi, P Tomasello, A Babu… - Journal of Machine …, 2024 - jmlr.org
Expanding the language coverage of speech technology has the potential to improve
access to information for many more people. However, current speech technology is …

Google usm: Scaling automatic speech recognition beyond 100 languages

Y Zhang, W Han, J Qin, Y Wang, A Bapna… - arXiv preprint arXiv …, 2023 - arxiv.org
We introduce the Universal Speech Model (USM), a single large model that performs
automatic speech recognition (ASR) across 100+ languages. This is achieved by pre …

Ml-superb: Multilingual speech universal performance benchmark

J Shi, D Berrebbi, W Chen, HL Chung, EP Hu… - arXiv preprint arXiv …, 2023 - arxiv.org
Speech processing Universal PERformance Benchmark (SUPERB) is a leaderboard to
benchmark the performance of Self-Supervised Learning (SSL) models on various speech …

Automatic speech recognition for Uyghur, Kazakh, and Kyrgyz: An overview

W Du, Y Maimaitiyiming, M Nijat, L Li, A Hamdulla… - Applied Sciences, 2022 - mdpi.com
With the emergence of deep learning, the performance of automatic speech recognition
(ASR) systems has remarkably improved. Especially for resource-rich languages such as …

Improving massively multilingual asr with auxiliary ctc objectives

W Chen, B Yan, J Shi, Y Peng, S Maiti… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Multilingual Automatic Speech Recognition (ASR) models have extended the usability of
speech technologies to a wide variety of languages. With how many languages these …

Textless direct speech-to-speech translation with discrete speech representation

X Li, Y Jia, CC Chiu - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org
Research on speech-to-speech translation (S2ST) has progressed rapidly in recent years.
Many end-to-end systems have been proposed and show advantages over conventional …

Findings of the 2023 ml-superb challenge: Pre-training and evaluation over more languages and beyond

J Shi, W Chen, D Berrebbi, HH Wang… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge
expands upon the acclaimed SUPERB framework, emphasizing self-supervised models in …

Towards zero-shot code-switched speech recognition

B Yan, M Wiesner, O Klejch, P Jyothi… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
In this work, we seek to build effective code-switched (CS) automatic speech recognition
systems (ASR) under the zero-shot set-ting where no transcribed CS speech data is …

Learning to speak from text: Zero-shot multilingual text-to-speech with unsupervised text pretraining

T Saeki, S Maiti, X Li, S Watanabe, S Takamichi… - arXiv preprint arXiv …, 2023 - arxiv.org
While neural text-to-speech (TTS) has achieved human-like natural synthetic speech,
multilingual TTS systems are limited to resource-rich languages due to the need for paired …

The Balancing Act: Unmasking and Alleviating ASR Biases in Portuguese

A Kulkarni, A Tokareva, R Qureshi… - arXiv preprint arXiv …, 2024 - arxiv.org
In the field of spoken language understanding, systems like Whisper and Multilingual
Massive Speech (MMS) have shown state-of-the-art performances. This study is dedicated …