Self-supervised speech representation learning: A review
Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …
necessitated the building of specialist models for individual tasks and application scenarios …
Google usm: Scaling automatic speech recognition beyond 100 languages
We introduce the Universal Speech Model (USM), a single large model that performs
automatic speech recognition (ASR) across 100+ languages. This is achieved by pre …
automatic speech recognition (ASR) across 100+ languages. This is achieved by pre …
W2v-bert: Combining contrastive learning and masked language modeling for self-supervised speech pre-training
Motivated by the success of masked language modeling (MLM) in pre-training natural
language processing models, we propose w2v-BERT that explores MLM for self-supervised …
language processing models, we propose w2v-BERT that explores MLM for self-supervised …
Rethinking pre-training and self-training
Pre-training is a dominant paradigm in computer vision. For example, supervised ImageNet
pre-training is commonly used to initialize the backbones of object detection and …
pre-training is commonly used to initialize the backbones of object detection and …
Pushing the limits of semi-supervised learning for automatic speech recognition
We employ a combination of recent developments in semi-supervised learning for automatic
speech recognition to obtain state-of-the-art results on LibriSpeech utilizing the unlabeled …
speech recognition to obtain state-of-the-art results on LibriSpeech utilizing the unlabeled …
Self-training with noisy student improves imagenet classification
We present a simple self-training method that achieves 88.4% top-1 accuracy on ImageNet,
which is 2.0% better than the state-of-the-art model that requires 3.5 B weakly labeled …
which is 2.0% better than the state-of-the-art model that requires 3.5 B weakly labeled …
Webface260m: A benchmark unveiling the power of million-scale deep face recognition
In this paper, we contribute a new million-scale face benchmark containing noisy 4M
identities/260M faces (WebFace260M) and cleaned 2M identities/42M faces (WebFace42M) …
identities/260M faces (WebFace260M) and cleaned 2M identities/42M faces (WebFace42M) …
Bigssl: Exploring the frontier of large-scale semi-supervised learning for automatic speech recognition
We summarize the results of a host of efforts using giant automatic speech recognition (ASR)
models pre-trained using large, diverse unlabeled datasets containing approximately a …
models pre-trained using large, diverse unlabeled datasets containing approximately a …
Improved noisy student training for automatic speech recognition
Recently, a semi-supervised learning method known as" noisy student training" has been
shown to improve image classification performance of deep networks significantly. Noisy …
shown to improve image classification performance of deep networks significantly. Noisy …
Self-training and pre-training are complementary for speech recognition
Self-training and unsupervised pre-training have emerged as effective approaches to
improve speech recognition systems using unlabeled data. However, it is not clear whether …
improve speech recognition systems using unlabeled data. However, it is not clear whether …