Hubert: Self-supervised speech representation learning by masked prediction of hidden units

WN Hsu, B Bolte, YHH Tsai, K Lakhotia… - … ACM transactions on …, 2021 - ieeexplore.ieee.org
Self-supervised approaches for speech representation learning are challenged by three
unique problems:(1) there are multiple sound units in each input utterance,(2) there is no …

On generative spoken language modeling from raw audio

K Lakhotia, E Kharitonov, WN Hsu, Y Adi… - Transactions of the …, 2021 - direct.mit.edu
Abstract We introduce Generative Spoken Language Modeling, the task of learning the
acoustic and linguistic characteristics of a language from raw audio (no text, no labels), and …

Speech resynthesis from discrete disentangled self-supervised representations

A Polyak, Y Adi, J Copet, E Kharitonov… - arXiv preprint arXiv …, 2021 - arxiv.org
We propose using self-supervised discrete representations for the task of speech
resynthesis. To generate disentangled representation, we separately extract low-bitrate …

Unsupervised speech representation learning using wavenet autoencoders

J Chorowski, RJ Weiss, S Bengio… - … /ACM transactions on …, 2019 - ieeexplore.ieee.org
We consider the task of unsupervised extraction of meaningful latent representations of
speech by applying autoencoding neural networks to speech waveforms. The goal is to …

HuBERT: How much can a bad teacher benefit ASR pre-training?

WN Hsu, YHH Tsai, B Bolte… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
Compared to vision and language applications, self-supervised pre-training approaches for
ASR are challenged by three unique problems:(1) There are multiple sound units in each …

Textless speech emotion conversion using discrete and decomposed representations

F Kreuk, A Polyak, J Copet, E Kharitonov… - arXiv preprint arXiv …, 2021 - arxiv.org
Speech emotion conversion is the task of modifying the perceived emotion of a speech
utterance while preserving the lexical content and speaker identity. In this study, we cast the …

Identifying patterns in financial markets: Extending the statistical jump model for regime identification

AO Aydınhan, PN Kolm, JM Mulvey, Y Shu - Annals of Operations …, 2024 - Springer
Regime-driven models are popular for addressing temporal patterns in both financial market
performance and underlying stylized factors, wherein a regime describes periods with …

Variable-rate hierarchical CPC leads to acoustic unit discovery in speech

S Cuervo, A Lancucki, R Marxer… - Advances in …, 2022 - proceedings.neurips.cc
The success of deep learning comes from its ability to capture the hierarchical structure of
data by learning high-level representations defined in terms of low-level ones. In this paper …

Neuro-serket: development of integrative cognitive system through the composition of deep probabilistic generative models

T Taniguchi, T Nakamura, M Suzuki… - New Generation …, 2020 - Springer
This paper describes a framework for the development of an integrative cognitive system
based on probabilistic generative models (PGMs) called Neuro-SERKET. Neuro-SERKET is …

Robust training of vector quantized bottleneck models

A Łańcucki, J Chorowski, G Sanchez… - … Joint Conference on …, 2020 - ieeexplore.ieee.org
In this paper we demonstrate methods for reliable and efficient training of discrete
representation using Vector-Quantized Variational Auto-Encoder models (VQ-VAEs) …