Lessons from building acoustic models with a million hours of speech

A Mohamed, H Lee, L Borgholt… - IEEE Journal of …, 2022 - ieeexplore.ieee.org

Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …

被引用次数：343 相关文章所有 10 个版本

[PDF] arxiv.org

Google usm: Scaling automatic speech recognition beyond 100 languages

Y Zhang, W Han, J Qin, Y Wang, A Bapna… - arXiv preprint arXiv …, 2023 - arxiv.org

We introduce the Universal Speech Model (USM), a single large model that performs
automatic speech recognition (ASR) across 100+ languages. This is achieved by pre …

被引用次数：229 相关文章所有 3 个版本

[PDF] arxiv.org

W2v-bert: Combining contrastive learning and masked language modeling for self-supervised speech pre-training

YA Chung, Y Zhang, W Han, CC Chiu… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org

Motivated by the success of masked language modeling (MLM) in pre-training natural
language processing models, we propose w2v-BERT that explores MLM for self-supervised …

被引用次数：381 相关文章所有 5 个版本

[PDF] neurips.cc

Rethinking pre-training and self-training

B Zoph, G Ghiasi, TY Lin, Y Cui, H Liu… - Advances in neural …, 2020 - proceedings.neurips.cc

Pre-training is a dominant paradigm in computer vision. For example, supervised ImageNet
pre-training is commonly used to initialize the backbones of object detection and …

被引用次数：739 相关文章所有 7 个版本

[PDF] arxiv.org

Pushing the limits of semi-supervised learning for automatic speech recognition

Y Zhang, J Qin, DS Park, W Han, CC Chiu… - arXiv preprint arXiv …, 2020 - arxiv.org

We employ a combination of recent developments in semi-supervised learning for automatic
speech recognition to obtain state-of-the-art results on LibriSpeech utilizing the unlabeled …

被引用次数：362 相关文章所有 2 个版本

[PDF] thecvf.com

Self-training with noisy student improves imagenet classification

Q Xie, MT Luong, E Hovy… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com

We present a simple self-training method that achieves 88.4% top-1 accuracy on ImageNet,
which is 2.0% better than the state-of-the-art model that requires 3.5 B weakly labeled …

被引用次数：2800 相关文章所有 12 个版本

[PDF] thecvf.com

Webface260m: A benchmark unveiling the power of million-scale deep face recognition

Z Zhu, G Huang, J Deng, Y Ye… - Proceedings of the …, 2021 - openaccess.thecvf.com

In this paper, we contribute a new million-scale face benchmark containing noisy 4M
identities/260M faces (WebFace260M) and cleaned 2M identities/42M faces (WebFace42M) …

被引用次数：249 相关文章所有 5 个版本

[PDF] arxiv.org

Bigssl: Exploring the frontier of large-scale semi-supervised learning for automatic speech recognition

Y Zhang, DS Park, W Han, J Qin… - IEEE Journal of …, 2022 - ieeexplore.ieee.org

We summarize the results of a host of efforts using giant automatic speech recognition (ASR)
models pre-trained using large, diverse unlabeled datasets containing approximately a …

被引用次数：175 相关文章所有 4 个版本

[PDF] arxiv.org

Improved noisy student training for automatic speech recognition

DS Park, Y Zhang, Y Jia, W Han, CC Chiu, B Li… - arXiv preprint arXiv …, 2020 - arxiv.org

Recently, a semi-supervised learning method known as" noisy student training" has been
shown to improve image classification performance of deep networks significantly. Noisy …

被引用次数：265 相关文章所有 6 个版本

[PDF] arxiv.org

Self-training and pre-training are complementary for speech recognition

Q Xu, A Baevski, T Likhomanenko… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

Self-training and unsupervised pre-training have emerged as effective approaches to
improve speech recognition systems using unlabeled data. However, it is not clear whether …

被引用次数：184 相关文章所有 7 个版本