Scaling speech technology to 1,000+ languages
Expanding the language coverage of speech technology has the potential to improve
access to information for many more people. However, current speech technology is …
access to information for many more people. However, current speech technology is …
Google usm: Scaling automatic speech recognition beyond 100 languages
We introduce the Universal Speech Model (USM), a single large model that performs
automatic speech recognition (ASR) across 100+ languages. This is achieved by pre …
automatic speech recognition (ASR) across 100+ languages. This is achieved by pre …
Ml-superb: Multilingual speech universal performance benchmark
Speech processing Universal PERformance Benchmark (SUPERB) is a leaderboard to
benchmark the performance of Self-Supervised Learning (SSL) models on various speech …
benchmark the performance of Self-Supervised Learning (SSL) models on various speech …
Automatic speech recognition for Uyghur, Kazakh, and Kyrgyz: An overview
With the emergence of deep learning, the performance of automatic speech recognition
(ASR) systems has remarkably improved. Especially for resource-rich languages such as …
(ASR) systems has remarkably improved. Especially for resource-rich languages such as …
Improving massively multilingual asr with auxiliary ctc objectives
Multilingual Automatic Speech Recognition (ASR) models have extended the usability of
speech technologies to a wide variety of languages. With how many languages these …
speech technologies to a wide variety of languages. With how many languages these …
Textless direct speech-to-speech translation with discrete speech representation
Research on speech-to-speech translation (S2ST) has progressed rapidly in recent years.
Many end-to-end systems have been proposed and show advantages over conventional …
Many end-to-end systems have been proposed and show advantages over conventional …
Findings of the 2023 ml-superb challenge: Pre-training and evaluation over more languages and beyond
The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge
expands upon the acclaimed SUPERB framework, emphasizing self-supervised models in …
expands upon the acclaimed SUPERB framework, emphasizing self-supervised models in …
Towards zero-shot code-switched speech recognition
In this work, we seek to build effective code-switched (CS) automatic speech recognition
systems (ASR) under the zero-shot set-ting where no transcribed CS speech data is …
systems (ASR) under the zero-shot set-ting where no transcribed CS speech data is …
Learning to speak from text: Zero-shot multilingual text-to-speech with unsupervised text pretraining
While neural text-to-speech (TTS) has achieved human-like natural synthetic speech,
multilingual TTS systems are limited to resource-rich languages due to the need for paired …
multilingual TTS systems are limited to resource-rich languages due to the need for paired …
The Balancing Act: Unmasking and Alleviating ASR Biases in Portuguese
A Kulkarni, A Tokareva, R Qureshi… - arXiv preprint arXiv …, 2024 - arxiv.org
In the field of spoken language understanding, systems like Whisper and Multilingual
Massive Speech (MMS) have shown state-of-the-art performances. This study is dedicated …
Massive Speech (MMS) have shown state-of-the-art performances. This study is dedicated …