Unsupervised cross-lingual representation learning for speech recognition

C Zhang, C Zhang, S Zheng, Y Qiao, C Li… - arXiv preprint arXiv …, 2023 - arxiv.org

As ChatGPT goes viral, generative AI (AIGC, aka AI-generated content) has made headlines
everywhere because of its ability to analyze and create text, images, and beyond. With such …

被引用次数：157 相关文章所有 4 个版本

[PDF] dtu.dk

Self-supervised speech representation learning: A review

A Mohamed, H Lee, L Borgholt… - IEEE Journal of …, 2022 - ieeexplore.ieee.org

Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …

被引用次数：283 相关文章所有 10 个版本

[PDF] thecvf.com

Flava: A foundational language and vision alignment model

A Singh, R Hu, V Goswami… - Proceedings of the …, 2022 - openaccess.thecvf.com

State-of-the-art vision and vision-and-language models rely on large-scale visio-linguistic
pretraining for obtaining good performance on a variety of downstream tasks. Generally …

被引用次数：538 相关文章所有 6 个版本

[PDF] arxiv.org

XLS-R: Self-supervised cross-lingual speech representation learning at scale

A Babu, C Wang, A Tjandra, K Lakhotia, Q Xu… - arXiv preprint arXiv …, 2021 - arxiv.org

This paper presents XLS-R, a large-scale model for cross-lingual speech representation
learning based on wav2vec 2.0. We train models with up to 2B parameters on nearly half a …

被引用次数：528 相关文章所有 5 个版本

[PDF] arxiv.org

Fleurs: Few-shot learning evaluation of universal representations of speech

A Conneau, M Ma, S Khanuja, Y Zhang… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org

We introduce FLEURS, the Few-shot Learning Evaluation of Universal Representations of
Speech benchmark. FLEURS is an n-way parallel speech dataset in 102 languages built on …

被引用次数：158 相关文章所有 6 个版本

[PDF] arxiv.org

VoxPopuli: A large-scale multilingual speech corpus for representation learning, semi-supervised learning and interpretation

C Wang, M Riviere, A Lee, A Wu, C Talnikar… - arXiv preprint arXiv …, 2021 - arxiv.org

We introduce VoxPopuli, a large-scale multilingual corpus providing 100K hours of
unlabelled speech data in 23 languages. It is the largest open data to date for unsupervised …

被引用次数：373 相关文章所有 9 个版本

[PDF] arxiv.org

Supervised contrastive learning for pre-trained language model fine-tuning

B Gunel, J Du, A Conneau, V Stoyanov - arXiv preprint arXiv:2011.01403, 2020 - arxiv.org

State-of-the-art natural language understanding classification models follow two-stages: pre-
training a large language model on an auxiliary task, and then fine-tuning the model on a …

被引用次数：424 相关文章所有 3 个版本

[PDF] mlr.press

Self-supervised learning with random-projection quantizer for speech recognition

CC Chiu, J Qin, Y Zhang, J Yu… - … Conference on Machine …, 2022 - proceedings.mlr.press

We present a simple and effective self-supervised learning approach for speech recognition.
The approach learns a model to predict the masked speech signals, in the form of discrete …

被引用次数：121 相关文章所有 5 个版本

[PDF] arxiv.org

Layer-wise analysis of a self-supervised speech representation model

A Pasad, JC Chou, K Livescu - 2021 IEEE Automatic Speech …, 2021 - ieeexplore.ieee.org

Recently proposed self-supervised learning approaches have been successful for pre-
training speech representation models. The utility of these learned representations has been …

被引用次数：227 相关文章所有 5 个版本

[PDF] arxiv.org

Robust wav2vec 2.0: Analyzing domain shift in self-supervised pre-training

WN Hsu, A Sriram, A Baevski, T Likhomanenko… - arXiv preprint arXiv …, 2021 - arxiv.org

Self-supervised learning of speech representations has been a very active research area
but most work is focused on a single domain such as read audio books for which there exist …

被引用次数：226 相关文章所有 7 个版本