Self-supervised speech representation learning: A review
Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …
necessitated the building of specialist models for individual tasks and application scenarios …
Superb: Speech processing universal performance benchmark
Self-supervised learning (SSL) has proven vital for advancing research in natural language
processing (NLP) and computer vision (CV). The paradigm pretrains a shared model on …
processing (NLP) and computer vision (CV). The paradigm pretrains a shared model on …
Codec-superb: An in-depth analysis of sound codec models
The sound codec's dual roles in minimizing data transmission latency and serving as
tokenizers underscore its critical importance. Recent years have witnessed significant …
tokenizers underscore its critical importance. Recent years have witnessed significant …
Multilingual spoken term detection: a review
G Deekshitha, L Mary - International Journal of Speech Technology, 2020 - Springer
In modern multilingual societies, there is a demand for multilingual Automatic Speech
Recognition (ASR) and Spoken Term Detection (STD). Multilingual Spoken Term Detection …
Recognition (ASR) and Spoken Term Detection (STD). Multilingual Spoken Term Detection …
Dynamic-superb phase-2: A collaboratively expanding benchmark for measuring the capabilities of spoken language models with 180 tasks
Multimodal foundation models, such as Gemini and ChatGPT, have revolutionized human-
machine interactions by seamlessly integrating various forms of data. Developing a …
machine interactions by seamlessly integrating various forms of data. Developing a …
Codec-SUPERB@ SLT 2024: A lightweight benchmark for neural audio codec models
Neural audio codec models are becoming increasingly important as they serve as
tokenizers for audio, enabling efficient transmission or facilitating speech language …
tokenizers for audio, enabling efficient transmission or facilitating speech language …
SCORE: Self-Supervised Correspondence Fine-Tuning for Improved Content Representations
A Meghanani, T Hain - ICASSP 2024-2024 IEEE International …, 2024 - ieeexplore.ieee.org
There is a growing interest in cost-effective self-supervised fine-tuning (SSFT) of self-
supervised learning (SSL)-based speech models to obtain task-specific representations …
supervised learning (SSL)-based speech models to obtain task-specific representations …
[HTML][HTML] NUVA: a naming utterance verifier for aphasia treatment
DS Barbera, M Huckvale, V Fleming, E Upton… - Computer Speech & …, 2021 - Elsevier
Anomia (word-finding difficulties) is the hallmark of aphasia, an acquired language disorder
most commonly caused by stroke. Assessment of speech performance using picture naming …
most commonly caused by stroke. Assessment of speech performance using picture naming …
[PDF][PDF] Query by Example Search on Speech at Mediaeval 2015.
In this paper, we describe the “Query by Example Search on Speech Task”(QUESST), held
as part of the MediaEval 2015 evaluation campaign. As in previous years, the proposed task …
as part of the MediaEval 2015 evaluation campaign. As in previous years, the proposed task …
A Large-Scale Evaluation of Speech Foundation Models
The foundation model paradigm leverages a shared foundation model to achieve state-of-
the-art (SOTA) performance for various tasks, requiring minimal downstream-specific data …
the-art (SOTA) performance for various tasks, requiring minimal downstream-specific data …