The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization
This paper presents the CHiME-8 DASR challenge which carries on from the previous
edition CHiME-7 DASR (C7DASR) and the past CHiME-6 challenge. It focuses on joint multi …
edition CHiME-7 DASR (C7DASR) and the past CHiME-6 challenge. It focuses on joint multi …
One Model to Rule Them All? Towards End-to-End Joint Speaker Diarization and Speech Recognition
This paper presents a novel framework for joint speaker diarization (SD) and automatic
speech recognition (ASR), named SLIDAR (sliding-window diarization-augmented …
speech recognition (ASR), named SLIDAR (sliding-window diarization-augmented …
Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens
We propose Sortformer, a novel neural model for speaker diarization, trained with
unconventional objectives compared to existing end-to-end diarization models. The …
unconventional objectives compared to existing end-to-end diarization models. The …
NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks
Self-supervised learning has been proved to benefit a wide range of speech processing
tasks, such as speech recognition/translation, speaker verification and diarization, etc …
tasks, such as speech recognition/translation, speaker verification and diarization, etc …
A cross-talk robust multichannel VAD model for multiparty agent interactions trained using synthetic re-recordings
In this work, we propose a novel cross-talk rejection framework for a multi-channel multi-
talker setup for a live multiparty interactive show. Our far-field audio setup is required to be …
talker setup for a live multiparty interactive show. Our far-field audio setup is required to be …
Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition
S Cornell, J Darefsky, Z Duan, S Watanabe - arXiv preprint arXiv …, 2024 - arxiv.org
Currently, a common approach in many speech processing tasks is to leverage large scale
pre-trained models by fine-tuning them on in-domain data for a particular application. Yet …
pre-trained models by fine-tuning them on in-domain data for a particular application. Yet …