Massively multilingual ASR: 50 languages, 1 model, 1 billion parameters

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com

Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

被引用次数：338 相关文章所有 7 个版本

[PDF] ieee.org

Adaptation algorithms for neural network-based speech recognition: An overview

P Bell, J Fainberg, O Klejch, J Li… - IEEE Open Journal …, 2020 - ieeexplore.ieee.org

We present a structured overview of adaptation algorithms for neural network-based speech
recognition, considering both hybrid hidden Markov model/neural network systems and end …

被引用次数：90 相关文章所有 7 个版本

[PDF] mlr.press

Robust speech recognition via large-scale weak supervision

A Radford, JW Kim, T Xu, G Brockman… - International …, 2023 - proceedings.mlr.press

We study the capabilities of speech processing systems trained simply to predict large
amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual …

被引用次数：2368 相关文章所有 11 个版本

[PDF] jmlr.org

Scaling speech technology to 1,000+ languages

V Pratap, A Tjandra, B Shi, P Tomasello, A Babu… - Journal of Machine …, 2024 - jmlr.org

Expanding the language coverage of speech technology has the potential to improve
access to information for many more people. However, current speech technology is …

被引用次数：176 相关文章所有 3 个版本

[PDF] arxiv.org

A generalist agent

S Reed, K Zolna, E Parisotto, SG Colmenarejo… - arXiv preprint arXiv …, 2022 - arxiv.org

Inspired by progress in large-scale language modeling, we apply a similar approach
towards building a single generalist agent beyond the realm of text outputs. The agent …

被引用次数：802 相关文章所有 4 个版本

[PDF] arxiv.org

Finetuned language models are zero-shot learners

J Wei, M Bosma, VY Zhao, K Guu, AW Yu… - arXiv preprint arXiv …, 2021 - arxiv.org

This paper explores a simple method for improving the zero-shot learning abilities of
language models. We show that instruction tuning--finetuning language models on a …

被引用次数：2593 相关文章所有 6 个版本

[PDF] ieee.org

End-to-end speech recognition: A survey

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org

In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

被引用次数：88 相关文章所有 6 个版本

[PDF] arxiv.org

Room-across-room: Multilingual vision-and-language navigation with dense spatiotemporal grounding

A Ku, P Anderson, R Patel, E Ie, J Baldridge - arXiv preprint arXiv …, 2020 - arxiv.org

We introduce Room-Across-Room (RxR), a new Vision-and-Language Navigation (VLN)
dataset. RxR is multilingual (English, Hindi, and Telugu) and larger (more paths and …

被引用次数：251 相关文章所有 6 个版本

[PDF] arxiv.org

Simple and effective zero-shot cross-lingual phoneme recognition

Q Xu, A Baevski, M Auli - arXiv preprint arXiv:2109.11680, 2021 - arxiv.org

Recent progress in self-training, self-supervised pretraining and unsupervised learning
enabled well performing speech recognition systems without any labeled data. However, in …

被引用次数：68 相关文章所有 6 个版本

[PDF] arxiv.org

Multilingual and code-switching ASR challenges for low resource Indian languages

A Diwan, R Vaideeswaran, S Shah, A Singh… - arXiv preprint arXiv …, 2021 - arxiv.org

Recently, there is increasing interest in multilingual automatic speech recognition (ASR)
where a speech recognition system caters to multiple low resource languages by taking …

被引用次数：81 相关文章所有 12 个版本