Dinosr: Self-distillation and online clustering for self-supervised speech representation learning

AH Liu, HJ Chang, M Auli, WN Hsu… - Advances in Neural …, 2024 - proceedings.neurips.cc
In this paper, we introduce self-distillation and online clustering for self-supervised speech
representation learning (DinoSR) which combines masked language modeling, self …

Lextreme: A multi-lingual and multi-task benchmark for the legal domain

J Niklaus, V Matoshi, P Rani, A Galassi… - arXiv preprint arXiv …, 2023 - arxiv.org
Lately, propelled by the phenomenal advances around the transformer architecture, the
legal NLP field has enjoyed spectacular growth. To measure progress, well curated and …

What do self-supervised speech models know about words?

A Pasad, CM Chien, S Settle, K Livescu - Transactions of the …, 2024 - direct.mit.edu
Many self-supervised speech models (S3Ms) have been introduced over the last few years,
improving performance and data efficiency on various speech tasks. However, these …

What do self-supervised speech models know about words?

A Pasad, CM Chien, S Settle, K Livescu - arXiv preprint arXiv:2307.00162, 2023 - arxiv.org
Many self-supervised speech models (S3Ms) have been introduced over the last few years,
producing performance and data efficiency improvements for a variety of speech tasks …

Pheme: Efficient and Conversational Speech Generation

P Budzianowski, T Sereda, T Cichy, I Vulić - arXiv preprint arXiv …, 2024 - arxiv.org
In recent years, speech generation has seen remarkable progress, now achieving one-shot
generation capability that is often virtually indistinguishable from real human voice …

R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces

HJ Chang, J Glass - arXiv preprint arXiv:2311.09117, 2023 - arxiv.org
This paper introduces Robust Spin (R-Spin), a data-efficient self-supervised fine-tuning
framework for speaker and noise-invariant speech representations by learning discrete …

Evolutionary Multi-objective Optimization for Contextual Adversarial Example Generation

S Zhou, M Huang, Y Sun, K Li - Proceedings of the ACM on Software …, 2024 - dl.acm.org
The emergence of the'code naturalness' concept, which suggests that software code shares
statistical properties with natural language, paves the way for deep neural networks (DNNs) …

A survey of Polish ASR speech datasets

M Junczyk - Poznan Studies in Contemporary Linguistics, 2024 - degruyter.com
Access to speech datasets is essential for the effective use of modern ASR systems in low-
resource languages like Polish. However, the lack of centralized information and metadata …

Perturbation-invariant Speech Representation Learning by Online Clustering

HJ Chang - 2024 - dspace.mit.edu
Despite success across various tasks, self-supervised speech models face significant
challenges in enhancing content-related performance with unlabeled data, requiring …

[PDF][PDF] Enhancing Automated English Speaking Assessment for L2 Speakers with BERT and Wav2vec2. 0 Fusion

WH Peng, HW Wang, S Chen… - Proceedings of the 35th …, 2023 - aclanthology.org
摘要英語逐漸作為許多國家的第二語言 (English as a Second Language, ESL),
同時也帶動電腦輔助語言學習的發展, 近年來又以發展自動口語評測較為熱門. 然而 …