The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization

S Cornell, T Park, S Huang, C Boeddeker… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper presents the CHiME-8 DASR challenge which carries on from the previous
edition CHiME-7 DASR (C7DASR) and the past CHiME-6 challenge. It focuses on joint multi …

One Model to Rule Them All? Towards End-to-End Joint Speaker Diarization and Speech Recognition

S Cornell, J Jung, S Watanabe… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
This paper presents a novel framework for joint speaker diarization (SD) and automatic
speech recognition (ASR), named SLIDAR (sliding-window diarization-augmented …

Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens

T Park, I Medennikov, K Dhawan, W Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
We propose Sortformer, a novel neural model for speaker diarization, trained with
unconventional objectives compared to existing end-to-end diarization models. The …

NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks

H Huang, T Park, K Dhawan, I Medennikov… - arXiv preprint arXiv …, 2024 - arxiv.org
Self-supervised learning has been proved to benefit a wide range of speech processing
tasks, such as speech recognition/translation, speaker verification and diarization, etc …

A cross-talk robust multichannel VAD model for multiparty agent interactions trained using synthetic re-recordings

H Han, N Kumar - arXiv preprint arXiv:2402.09797, 2024 - arxiv.org
In this work, we propose a novel cross-talk rejection framework for a multi-channel multi-
talker setup for a live multiparty interactive show. Our far-field audio setup is required to be …

Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition

S Cornell, J Darefsky, Z Duan, S Watanabe - arXiv preprint arXiv …, 2024 - arxiv.org
Currently, a common approach in many speech processing tasks is to leverage large scale
pre-trained models by fine-tuning them on in-domain data for a particular application. Yet …