On Speaker Attribution with SURT

D Raj, M Wiesner, M Maciejewski… - arXiv preprint arXiv …, 2024 - arxiv.org
The Streaming Unmixing and Recognition Transducer (SURT) has recently become a
popular framework for continuous, streaming, multi-talker speech recognition (ASR). With …

LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization

Z Jin, Y Yang, M Shi, W Kang, X Yang, Z Yao… - arXiv preprint arXiv …, 2024 - arxiv.org
The evolving speech processing landscape is increasingly focused on complex scenarios
like meetings or cocktail parties with multiple simultaneous speakers and far-field conditions …

Sa-Paraformer: Non-Autoregressive End-To-End Speaker-Attributed ASR

Y Li, F Yu, Y Liang, P Guo, M Shi, Z Du… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Joint modeling of multi-speaker ASR and speaker diarization has recently shown promising
results in speaker-attributed automatic speech recognition (SA-ASR). Although being able to …

Advancing Multi-talker ASR Performance with Large Language Models

M Shi, Z Jin, Y Xu, Y Xu, SX Zhang, K Wei… - arXiv preprint arXiv …, 2024 - arxiv.org
Recognizing overlapping speech from multiple speakers in conversational scenarios is one
of the most challenging problem for automatic speech recognition (ASR). Serialized output …

AG-LSEC: Audio Grounded Lexical Speaker Error Correction

R Paturi, X Li, S Srinivasan - arXiv preprint arXiv:2406.17266, 2024 - arxiv.org
Speaker Diarization (SD) systems are typically audio-based and operate independently of
the ASR system in traditional speech transcription pipelines and can have speaker errors …