Dinosr: Self-distillation and online clustering for self-supervised speech representation learning
In this paper, we introduce self-distillation and online clustering for self-supervised speech
representation learning (DinoSR) which combines masked language modeling, self …
representation learning (DinoSR) which combines masked language modeling, self …
Superb@ slt 2022: Challenge on generalization and efficiency of self-supervised speech representation learning
We present the SUPERB challenge at SLT 2022, which aims at learning self-supervised
speech representation for better performance, generalization, and efficiency. The challenge …
speech representation for better performance, generalization, and efficiency. The challenge …
A Large-Scale Evaluation of Speech Foundation Models
The foundation model paradigm leverages a shared foundation model to achieve state-of-
the-art (SOTA) performance for various tasks, requiring minimal downstream-specific data …
the-art (SOTA) performance for various tasks, requiring minimal downstream-specific data …
SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis
R Sivaguru, VS Lodagala, S Umesh - arXiv preprint arXiv:2308.01018, 2023 - arxiv.org
While FastSpeech2 aims to integrate aspects of speech such as pitch, energy, and duration
as conditional inputs, it still leaves scope for richer representations. As a part of this work, we …
as conditional inputs, it still leaves scope for richer representations. As a part of this work, we …
data2vec-aqc: Search for the right teaching assistant in the teacher-student training setup
VS Lodagala, S Ghosh, S Umesh - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
In this paper, we propose a new Self-Supervised Learning (SSL) algorithm called data2vec-
aqc, for speech representation learning from unlabeled speech data. Our goal is to improve …
aqc, for speech representation learning from unlabeled speech data. Our goal is to improve …
RobustEmbed: Robust Sentence Embeddings Using Self-Supervised Contrastive Pre-Training
Pre-trained language models (PLMs) have demonstrated their exceptional performance
across a wide range of natural language processing tasks. The utilization of PLM-based …
across a wide range of natural language processing tasks. The utilization of PLM-based …
Pseudo-Label Based Supervised Contrastive Loss for Robust Speech Representations
V Krishna, S Ganapathy - 2023 IEEE Automatic Speech …, 2023 - ieeexplore.ieee.org
The self supervised learning (SSL) of speech, with discrete tokenization (pseudo-labels),
while illustrating performance improvements in low-resource speech recognition, has faced …
while illustrating performance improvements in low-resource speech recognition, has faced …
The Second DISPLACE Challenge: DIarization of SPeaker and LAnguage in Conversational Environments
SB Kalluri, P Singh, PR Chowdhuri, A Kulkarni… - arXiv preprint arXiv …, 2024 - arxiv.org
The DIarization of SPeaker and LAnguage in Conversational Environments (DISPLACE)
2024 challenge is the second in the series of DISPLACE challenges, which involves tasks of …
2024 challenge is the second in the series of DISPLACE challenges, which involves tasks of …
MCR-Data2vec 2.0: Improving Self-supervised Speech Pre-training via Model-level Consistency Regularization
Self-supervised learning (SSL) has shown significant progress in speech processing tasks.
However, despite the intrinsic randomness in the Transformer structure, such as dropout …
However, despite the intrinsic randomness in the Transformer structure, such as dropout …
Removing Speaker Information from Speech Representation using Variable-Length Soft Pooling
I Hwang, K Lee - arXiv preprint arXiv:2404.00856, 2024 - arxiv.org
Recently, there have been efforts to encode the linguistic information of speech using a self-
supervised framework for speech synthesis. However, predicting representations from …
supervised framework for speech synthesis. However, predicting representations from …