Dinosr: Self-distillation and online clustering for self-supervised speech representation learning
In this paper, we introduce self-distillation and online clustering for self-supervised speech
representation learning (DinoSR) which combines masked language modeling, self …
representation learning (DinoSR) which combines masked language modeling, self …
Lextreme: A multi-lingual and multi-task benchmark for the legal domain
Lately, propelled by the phenomenal advances around the transformer architecture, the
legal NLP field has enjoyed spectacular growth. To measure progress, well curated and …
legal NLP field has enjoyed spectacular growth. To measure progress, well curated and …
What do self-supervised speech models know about words?
Many self-supervised speech models (S3Ms) have been introduced over the last few years,
improving performance and data efficiency on various speech tasks. However, these …
improving performance and data efficiency on various speech tasks. However, these …
What do self-supervised speech models know about words?
Many self-supervised speech models (S3Ms) have been introduced over the last few years,
producing performance and data efficiency improvements for a variety of speech tasks …
producing performance and data efficiency improvements for a variety of speech tasks …
Pheme: Efficient and Conversational Speech Generation
In recent years, speech generation has seen remarkable progress, now achieving one-shot
generation capability that is often virtually indistinguishable from real human voice …
generation capability that is often virtually indistinguishable from real human voice …
R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces
This paper introduces Robust Spin (R-Spin), a data-efficient self-supervised fine-tuning
framework for speaker and noise-invariant speech representations by learning discrete …
framework for speaker and noise-invariant speech representations by learning discrete …
Evolutionary Multi-objective Optimization for Contextual Adversarial Example Generation
The emergence of the'code naturalness' concept, which suggests that software code shares
statistical properties with natural language, paves the way for deep neural networks (DNNs) …
statistical properties with natural language, paves the way for deep neural networks (DNNs) …
A survey of Polish ASR speech datasets
M Junczyk - Poznan Studies in Contemporary Linguistics, 2024 - degruyter.com
Access to speech datasets is essential for the effective use of modern ASR systems in low-
resource languages like Polish. However, the lack of centralized information and metadata …
resource languages like Polish. However, the lack of centralized information and metadata …
Perturbation-invariant Speech Representation Learning by Online Clustering
HJ Chang - 2024 - dspace.mit.edu
Despite success across various tasks, self-supervised speech models face significant
challenges in enhancing content-related performance with unlabeled data, requiring …
challenges in enhancing content-related performance with unlabeled data, requiring …
[PDF][PDF] Enhancing Automated English Speaking Assessment for L2 Speakers with BERT and Wav2vec2. 0 Fusion
WH Peng, HW Wang, S Chen… - Proceedings of the 35th …, 2023 - aclanthology.org
摘要英語逐漸作為許多國家的第二語言 (English as a Second Language, ESL),
同時也帶動電腦輔助語言學習的發展, 近年來又以發展自動口語評測較為熱門. 然而 …
同時也帶動電腦輔助語言學習的發展, 近年來又以發展自動口語評測較為熱門. 然而 …