A complete survey on generative ai (aigc): Is chatgpt from gpt-4 to gpt-5 all you need?

C Zhang, C Zhang, S Zheng, Y Qiao, C Li… - arXiv preprint arXiv …, 2023 - arxiv.org
As ChatGPT goes viral, generative AI (AIGC, aka AI-generated content) has made headlines
everywhere because of its ability to analyze and create text, images, and beyond. With such …

Self-supervised speech representation learning: A review

A Mohamed, H Lee, L Borgholt… - IEEE Journal of …, 2022 - ieeexplore.ieee.org
Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …

Flava: A foundational language and vision alignment model

A Singh, R Hu, V Goswami… - Proceedings of the …, 2022 - openaccess.thecvf.com
State-of-the-art vision and vision-and-language models rely on large-scale visio-linguistic
pretraining for obtaining good performance on a variety of downstream tasks. Generally …

XLS-R: Self-supervised cross-lingual speech representation learning at scale

A Babu, C Wang, A Tjandra, K Lakhotia, Q Xu… - arXiv preprint arXiv …, 2021 - arxiv.org
This paper presents XLS-R, a large-scale model for cross-lingual speech representation
learning based on wav2vec 2.0. We train models with up to 2B parameters on nearly half a …

Fleurs: Few-shot learning evaluation of universal representations of speech

A Conneau, M Ma, S Khanuja, Y Zhang… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
We introduce FLEURS, the Few-shot Learning Evaluation of Universal Representations of
Speech benchmark. FLEURS is an n-way parallel speech dataset in 102 languages built on …

VoxPopuli: A large-scale multilingual speech corpus for representation learning, semi-supervised learning and interpretation

C Wang, M Riviere, A Lee, A Wu, C Talnikar… - arXiv preprint arXiv …, 2021 - arxiv.org
We introduce VoxPopuli, a large-scale multilingual corpus providing 100K hours of
unlabelled speech data in 23 languages. It is the largest open data to date for unsupervised …

Supervised contrastive learning for pre-trained language model fine-tuning

B Gunel, J Du, A Conneau, V Stoyanov - arXiv preprint arXiv:2011.01403, 2020 - arxiv.org
State-of-the-art natural language understanding classification models follow two-stages: pre-
training a large language model on an auxiliary task, and then fine-tuning the model on a …

Self-supervised learning with random-projection quantizer for speech recognition

CC Chiu, J Qin, Y Zhang, J Yu… - … Conference on Machine …, 2022 - proceedings.mlr.press
We present a simple and effective self-supervised learning approach for speech recognition.
The approach learns a model to predict the masked speech signals, in the form of discrete …

Layer-wise analysis of a self-supervised speech representation model

A Pasad, JC Chou, K Livescu - 2021 IEEE Automatic Speech …, 2021 - ieeexplore.ieee.org
Recently proposed self-supervised learning approaches have been successful for pre-
training speech representation models. The utility of these learned representations has been …

Robust wav2vec 2.0: Analyzing domain shift in self-supervised pre-training

WN Hsu, A Sriram, A Baevski, T Likhomanenko… - arXiv preprint arXiv …, 2021 - arxiv.org
Self-supervised learning of speech representations has been a very active research area
but most work is focused on a single domain such as read audio books for which there exist …