Self-supervised speech representation learning: A review

A Mohamed, H Lee, L Borgholt… - IEEE Journal of …, 2022 - ieeexplore.ieee.org
Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …

Superb: Speech processing universal performance benchmark

S Yang, PH Chi, YS Chuang, CIJ Lai… - arXiv preprint arXiv …, 2021 - arxiv.org
Self-supervised learning (SSL) has proven vital for advancing research in natural language
processing (NLP) and computer vision (CV). The paradigm pretrains a shared model on …

Codec-superb: An in-depth analysis of sound codec models

H Wu, HL Chung, YC Lin, YK Wu, X Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
The sound codec's dual roles in minimizing data transmission latency and serving as
tokenizers underscore its critical importance. Recent years have witnessed significant …

Multilingual spoken term detection: a review

G Deekshitha, L Mary - International Journal of Speech Technology, 2020 - Springer
In modern multilingual societies, there is a demand for multilingual Automatic Speech
Recognition (ASR) and Spoken Term Detection (STD). Multilingual Spoken Term Detection …

Dynamic-superb phase-2: A collaboratively expanding benchmark for measuring the capabilities of spoken language models with 180 tasks

C Huang, WC Chen, S Yang, AT Liu, CA Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Multimodal foundation models, such as Gemini and ChatGPT, have revolutionized human-
machine interactions by seamlessly integrating various forms of data. Developing a …

Codec-SUPERB@ SLT 2024: A lightweight benchmark for neural audio codec models

H Wu, X Chen, YC Lin, K Chang, J Du, KH Lu… - arXiv preprint arXiv …, 2024 - arxiv.org
Neural audio codec models are becoming increasingly important as they serve as
tokenizers for audio, enabling efficient transmission or facilitating speech language …

SCORE: Self-Supervised Correspondence Fine-Tuning for Improved Content Representations

A Meghanani, T Hain - ICASSP 2024-2024 IEEE International …, 2024 - ieeexplore.ieee.org
There is a growing interest in cost-effective self-supervised fine-tuning (SSFT) of self-
supervised learning (SSL)-based speech models to obtain task-specific representations …

[HTML][HTML] NUVA: a naming utterance verifier for aphasia treatment

DS Barbera, M Huckvale, V Fleming, E Upton… - Computer Speech & …, 2021 - Elsevier
Anomia (word-finding difficulties) is the hallmark of aphasia, an acquired language disorder
most commonly caused by stroke. Assessment of speech performance using picture naming …

[PDF][PDF] Query by Example Search on Speech at Mediaeval 2015.

I Szöke, LJ Rodriguez-Fuentes, A Buzo, X Anguera… - MediaEval, 2015 - ceur-ws.org
In this paper, we describe the “Query by Example Search on Speech Task”(QUESST), held
as part of the MediaEval 2015 evaluation campaign. As in previous years, the proposed task …

A Large-Scale Evaluation of Speech Foundation Models

S Yang, HJ Chang, Z Huang, AT Liu… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org
The foundation model paradigm leverages a shared foundation model to achieve state-of-
the-art (SOTA) performance for various tasks, requiring minimal downstream-specific data …