Hubert: Self-supervised speech representation learning by masked prediction of hidden units
Self-supervised approaches for speech representation learning are challenged by three
unique problems:(1) there are multiple sound units in each input utterance,(2) there is no …
unique problems:(1) there are multiple sound units in each input utterance,(2) there is no …
On generative spoken language modeling from raw audio
Abstract We introduce Generative Spoken Language Modeling, the task of learning the
acoustic and linguistic characteristics of a language from raw audio (no text, no labels), and …
acoustic and linguistic characteristics of a language from raw audio (no text, no labels), and …
Speech resynthesis from discrete disentangled self-supervised representations
We propose using self-supervised discrete representations for the task of speech
resynthesis. To generate disentangled representation, we separately extract low-bitrate …
resynthesis. To generate disentangled representation, we separately extract low-bitrate …
Unsupervised speech representation learning using wavenet autoencoders
We consider the task of unsupervised extraction of meaningful latent representations of
speech by applying autoencoding neural networks to speech waveforms. The goal is to …
speech by applying autoencoding neural networks to speech waveforms. The goal is to …
HuBERT: How much can a bad teacher benefit ASR pre-training?
Compared to vision and language applications, self-supervised pre-training approaches for
ASR are challenged by three unique problems:(1) There are multiple sound units in each …
ASR are challenged by three unique problems:(1) There are multiple sound units in each …
Textless speech emotion conversion using discrete and decomposed representations
Speech emotion conversion is the task of modifying the perceived emotion of a speech
utterance while preserving the lexical content and speaker identity. In this study, we cast the …
utterance while preserving the lexical content and speaker identity. In this study, we cast the …
Identifying patterns in financial markets: Extending the statistical jump model for regime identification
Regime-driven models are popular for addressing temporal patterns in both financial market
performance and underlying stylized factors, wherein a regime describes periods with …
performance and underlying stylized factors, wherein a regime describes periods with …
Variable-rate hierarchical CPC leads to acoustic unit discovery in speech
The success of deep learning comes from its ability to capture the hierarchical structure of
data by learning high-level representations defined in terms of low-level ones. In this paper …
data by learning high-level representations defined in terms of low-level ones. In this paper …
Neuro-serket: development of integrative cognitive system through the composition of deep probabilistic generative models
This paper describes a framework for the development of an integrative cognitive system
based on probabilistic generative models (PGMs) called Neuro-SERKET. Neuro-SERKET is …
based on probabilistic generative models (PGMs) called Neuro-SERKET. Neuro-SERKET is …
Robust training of vector quantized bottleneck models
In this paper we demonstrate methods for reliable and efficient training of discrete
representation using Vector-Quantized Variational Auto-Encoder models (VQ-VAEs) …
representation using Vector-Quantized Variational Auto-Encoder models (VQ-VAEs) …