Property-aware multi-speaker data simulation: A probabilistic modelling technique for synthetic...

The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization

S Cornell, T Park, S Huang, C Boeddeker… - arXiv preprint arXiv …, 2024 - arxiv.org

This paper presents the CHiME-8 DASR challenge which carries on from the previous
edition CHiME-7 DASR (C7DASR) and the past CHiME-6 challenge. It focuses on joint multi …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

One Model to Rule Them All? Towards End-to-End Joint Speaker Diarization and Speech Recognition

S Cornell, J Jung, S Watanabe… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

This paper presents a novel framework for joint speaker diarization (SD) and automatic
speech recognition (ASR), named SLIDAR (sliding-window diarization-augmented …

被引用次数：13 相关文章所有 3 个版本

[PDF] arxiv.org

Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens

T Park, I Medennikov, K Dhawan, W Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

We propose Sortformer, a novel neural model for speaker diarization, trained with
unconventional objectives compared to existing end-to-end diarization models. The …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks

H Huang, T Park, K Dhawan, I Medennikov… - arXiv preprint arXiv …, 2024 - arxiv.org

Self-supervised learning has been proved to benefit a wide range of speech processing
tasks, such as speech recognition/translation, speaker verification and diarization, etc …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

A cross-talk robust multichannel VAD model for multiparty agent interactions trained using synthetic re-recordings

H Han, N Kumar - arXiv preprint arXiv:2402.09797, 2024 - arxiv.org

In this work, we propose a novel cross-talk rejection framework for a multi-channel multi-
talker setup for a live multiparty interactive show. Our far-field audio setup is required to be …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition

S Cornell, J Darefsky, Z Duan, S Watanabe - arXiv preprint arXiv …, 2024 - arxiv.org

Currently, a common approach in many speech processing tasks is to leverage large scale
pre-trained models by fine-tuning them on in-domain data for a particular application. Yet …