ASR2K: Speech recognition for around 2000 languages without audio

V Pratap, A Tjandra, B Shi, P Tomasello, A Babu… - Journal of Machine …, 2024 - jmlr.org

Expanding the language coverage of speech technology has the potential to improve
access to information for many more people. However, current speech technology is …

被引用次数：275 相关文章所有 3 个版本

[PDF] arxiv.org

Google usm: Scaling automatic speech recognition beyond 100 languages

Y Zhang, W Han, J Qin, Y Wang, A Bapna… - arXiv preprint arXiv …, 2023 - arxiv.org

We introduce the Universal Speech Model (USM), a single large model that performs
automatic speech recognition (ASR) across 100+ languages. This is achieved by pre …

被引用次数：284 相关文章所有 3 个版本

[PDF] arxiv.org

Ml-superb: Multilingual speech universal performance benchmark

J Shi, D Berrebbi, W Chen, HL Chung, EP Hu… - arXiv preprint arXiv …, 2023 - arxiv.org

Speech processing Universal PERformance Benchmark (SUPERB) is a leaderboard to
benchmark the performance of Self-Supervised Learning (SSL) models on various speech …

被引用次数：57 相关文章所有 8 个版本

[PDF] mdpi.com

Automatic speech recognition for Uyghur, Kazakh, and Kyrgyz: An overview

W Du, Y Maimaitiyiming, M Nijat, L Li, A Hamdulla… - Applied Sciences, 2022 - mdpi.com

With the emergence of deep learning, the performance of automatic speech recognition
(ASR) systems has remarkably improved. Especially for resource-rich languages such as …

被引用次数：14 相关文章所有 3 个版本

[PDF] arxiv.org

Improving massively multilingual asr with auxiliary ctc objectives

W Chen, B Yan, J Shi, Y Peng, S Maiti… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Multilingual Automatic Speech Recognition (ASR) models have extended the usability of
speech technologies to a wide variety of languages. With how many languages these …

被引用次数：41 相关文章所有 5 个版本

[PDF] arxiv.org

Textless direct speech-to-speech translation with discrete speech representation

X Li, Y Jia, CC Chiu - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org

Research on speech-to-speech translation (S2ST) has progressed rapidly in recent years.
Many end-to-end systems have been proposed and show advantages over conventional …

被引用次数：22 相关文章所有 3 个版本

[PDF] arxiv.org

Findings of the 2023 ml-superb challenge: Pre-training and evaluation over more languages and beyond

J Shi, W Chen, D Berrebbi, HH Wang… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge
expands upon the acclaimed SUPERB framework, emphasizing self-supervised models in …

被引用次数：8 相关文章所有 5 个版本

[PDF] arxiv.org

Towards zero-shot code-switched speech recognition

B Yan, M Wiesner, O Klejch, P Jyothi… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

In this work, we seek to build effective code-switched (CS) automatic speech recognition
systems (ASR) under the zero-shot set-ting where no transcribed CS speech data is …

被引用次数：19 相关文章所有 6 个版本

[PDF] arxiv.org

Learning to speak from text: Zero-shot multilingual text-to-speech with unsupervised text pretraining

T Saeki, S Maiti, X Li, S Watanabe, S Takamichi… - arXiv preprint arXiv …, 2023 - arxiv.org

While neural text-to-speech (TTS) has achieved human-like natural synthetic speech,
multilingual TTS systems are limited to resource-rich languages due to the need for paired …

被引用次数：12 相关文章所有 9 个版本

[PDF] arxiv.org

The Balancing Act: Unmasking and Alleviating ASR Biases in Portuguese

A Kulkarni, A Tokareva, R Qureshi… - arXiv preprint arXiv …, 2024 - arxiv.org

In the field of spoken language understanding, systems like Whisper and Multilingual
Massive Speech (MMS) have shown state-of-the-art performances. This study is dedicated …

被引用次数：6 相关文章所有 7 个版本