A complete survey on generative ai (aigc): Is chatgpt from gpt-4 to gpt-5 all you need?
As ChatGPT goes viral, generative AI (AIGC, aka AI-generated content) has made headlines
everywhere because of its ability to analyze and create text, images, and beyond. With such …
everywhere because of its ability to analyze and create text, images, and beyond. With such …
Self-supervised speech representation learning: A review
Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …
necessitated the building of specialist models for individual tasks and application scenarios …
Flava: A foundational language and vision alignment model
State-of-the-art vision and vision-and-language models rely on large-scale visio-linguistic
pretraining for obtaining good performance on a variety of downstream tasks. Generally …
pretraining for obtaining good performance on a variety of downstream tasks. Generally …
XLS-R: Self-supervised cross-lingual speech representation learning at scale
This paper presents XLS-R, a large-scale model for cross-lingual speech representation
learning based on wav2vec 2.0. We train models with up to 2B parameters on nearly half a …
learning based on wav2vec 2.0. We train models with up to 2B parameters on nearly half a …
Fleurs: Few-shot learning evaluation of universal representations of speech
We introduce FLEURS, the Few-shot Learning Evaluation of Universal Representations of
Speech benchmark. FLEURS is an n-way parallel speech dataset in 102 languages built on …
Speech benchmark. FLEURS is an n-way parallel speech dataset in 102 languages built on …
VoxPopuli: A large-scale multilingual speech corpus for representation learning, semi-supervised learning and interpretation
We introduce VoxPopuli, a large-scale multilingual corpus providing 100K hours of
unlabelled speech data in 23 languages. It is the largest open data to date for unsupervised …
unlabelled speech data in 23 languages. It is the largest open data to date for unsupervised …
Supervised contrastive learning for pre-trained language model fine-tuning
State-of-the-art natural language understanding classification models follow two-stages: pre-
training a large language model on an auxiliary task, and then fine-tuning the model on a …
training a large language model on an auxiliary task, and then fine-tuning the model on a …
Self-supervised learning with random-projection quantizer for speech recognition
We present a simple and effective self-supervised learning approach for speech recognition.
The approach learns a model to predict the masked speech signals, in the form of discrete …
The approach learns a model to predict the masked speech signals, in the form of discrete …
Layer-wise analysis of a self-supervised speech representation model
Recently proposed self-supervised learning approaches have been successful for pre-
training speech representation models. The utility of these learned representations has been …
training speech representation models. The utility of these learned representations has been …
Robust wav2vec 2.0: Analyzing domain shift in self-supervised pre-training
Self-supervised learning of speech representations has been a very active research area
but most work is focused on a single domain such as read audio books for which there exist …
but most work is focused on a single domain such as read audio books for which there exist …