Fleurs: Few-shot learning evaluation of universal representations of speech

A Conneau, M Ma, S Khanuja, Y Zhang… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
We introduce FLEURS, the Few-shot Learning Evaluation of Universal Representations of
Speech benchmark. FLEURS is an n-way parallel speech dataset in 102 languages built on …

[HTML][HTML] Speech and language processing with deep learning for dementia diagnosis: A systematic review

M Shi, G Cheung, SR Shahamiri - Psychiatry Research, 2023 - Elsevier
Dementia is a progressive neurodegenerative disease that burdens the person living with
the disease, their families, and medical and social services. Timely diagnosis of dementia …

Real-time neural radiance talking portrait synthesis via audio-spatial decomposition

J Tang, K Wang, H Zhou, X Chen, D He, T Hu… - arXiv preprint arXiv …, 2022 - arxiv.org
While dynamic Neural Radiance Fields (NeRF) have shown success in high-fidelity 3D
modeling of talking portraits, the slow training and inference speed severely obstruct their …

A comparison of discrete and soft speech units for improved voice conversion

B Van Niekerk, MA Carbonneau, J Zaïdi… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
The goal of voice conversion is to transform source speech into a target voice, keeping the
content unchanged. In this paper, we focus on self-supervised representation learning for …

Simple and effective zero-shot cross-lingual phoneme recognition

Q Xu, A Baevski, M Auli - arXiv preprint arXiv:2109.11680, 2021 - arxiv.org
Recent progress in self-training, self-supervised pretraining and unsupervised learning
enabled well performing speech recognition systems without any labeled data. However, in …

Improving massively multilingual asr with auxiliary ctc objectives

W Chen, B Yan, J Shi, Y Peng, S Maiti… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Multilingual Automatic Speech Recognition (ASR) models have extended the usability of
speech technologies to a wide variety of languages. With how many languages these …

Language ID in the wild: Unexpected challenges on the path to a thousand-language web text corpus

I Caswell, T Breiner, D Van Esch, A Bapna - arXiv preprint arXiv …, 2020 - arxiv.org
Large text corpora are increasingly important for a wide variety of Natural Language
Processing (NLP) tasks, and automatic language identification (LangID) is a core technology …

Multilingual speech recognition for Turkic languages

S Mussakhojayeva, K Dauletbek, R Yeshpanov… - Information, 2023 - mdpi.com
The primary aim of this study was to contribute to the development of multilingual automatic
speech recognition for lower-resourced Turkic languages. Ten languages—Azerbaijani …

ASR2K: Speech recognition for around 2000 languages without audio

X Li, F Metze, DR Mortensen, AW Black… - arXiv preprint arXiv …, 2022 - arxiv.org
Most recent speech recognition models rely on large supervised datasets, which are
unavailable for many low-resource languages. In this work, we present a speech recognition …

[PDF][PDF] Low Resource ASR: The Surprising Effectiveness of High Resource Transliteration.

S Khare, AR Mittal, A Diwan, S Sarawagi, P Jyothi… - Interspeech, 2021 - isca-archive.org
Cross-lingual transfer of knowledge from high-resource languages to low-resource
languages is an important research problem in automatic speech recognition (ASR). We …