Dinosr: Self-distillation and online clustering for self-supervised speech representation learning

AH Liu, HJ Chang, M Auli, WN Hsu… - Advances in Neural …, 2024 - proceedings.neurips.cc
In this paper, we introduce self-distillation and online clustering for self-supervised speech
representation learning (DinoSR) which combines masked language modeling, self …

Superb@ slt 2022: Challenge on generalization and efficiency of self-supervised speech representation learning

T Feng, A Dong, CF Yeh, S Yang, TQ Lin… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
We present the SUPERB challenge at SLT 2022, which aims at learning self-supervised
speech representation for better performance, generalization, and efficiency. The challenge …

A Large-Scale Evaluation of Speech Foundation Models

S Yang, HJ Chang, Z Huang, AT Liu… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org
The foundation model paradigm leverages a shared foundation model to achieve state-of-
the-art (SOTA) performance for various tasks, requiring minimal downstream-specific data …

SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis

R Sivaguru, VS Lodagala, S Umesh - arXiv preprint arXiv:2308.01018, 2023 - arxiv.org
While FastSpeech2 aims to integrate aspects of speech such as pitch, energy, and duration
as conditional inputs, it still leaves scope for richer representations. As a part of this work, we …

data2vec-aqc: Search for the right teaching assistant in the teacher-student training setup

VS Lodagala, S Ghosh, S Umesh - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
In this paper, we propose a new Self-Supervised Learning (SSL) algorithm called data2vec-
aqc, for speech representation learning from unlabeled speech data. Our goal is to improve …

RobustEmbed: Robust Sentence Embeddings Using Self-Supervised Contrastive Pre-Training

J Asl, E Blanco, D Takabi - 2023 - digitalcommons.odu.edu
Pre-trained language models (PLMs) have demonstrated their exceptional performance
across a wide range of natural language processing tasks. The utilization of PLM-based …

Pseudo-Label Based Supervised Contrastive Loss for Robust Speech Representations

V Krishna, S Ganapathy - 2023 IEEE Automatic Speech …, 2023 - ieeexplore.ieee.org
The self supervised learning (SSL) of speech, with discrete tokenization (pseudo-labels),
while illustrating performance improvements in low-resource speech recognition, has faced …

The Second DISPLACE Challenge: DIarization of SPeaker and LAnguage in Conversational Environments

SB Kalluri, P Singh, PR Chowdhuri, A Kulkarni… - arXiv preprint arXiv …, 2024 - arxiv.org
The DIarization of SPeaker and LAnguage in Conversational Environments (DISPLACE)
2024 challenge is the second in the series of DISPLACE challenges, which involves tasks of …

MCR-Data2vec 2.0: Improving Self-supervised Speech Pre-training via Model-level Consistency Regularization

JW Yoon, SM Kim, NS Kim - arXiv preprint arXiv:2306.08463, 2023 - arxiv.org
Self-supervised learning (SSL) has shown significant progress in speech processing tasks.
However, despite the intrinsic randomness in the Transformer structure, such as dropout …

Removing Speaker Information from Speech Representation using Variable-Length Soft Pooling

I Hwang, K Lee - arXiv preprint arXiv:2404.00856, 2024 - arxiv.org
Recently, there have been efforts to encode the linguistic information of speech using a self-
supervised framework for speech synthesis. However, predicting representations from …