Casa-asr: Context-aware speaker-attributed asr

D Raj, M Wiesner, M Maciejewski… - arXiv preprint arXiv …, 2024 - arxiv.org

The Streaming Unmixing and Recognition Transducer (SURT) has recently become a
popular framework for continuous, streaming, multi-talker speech recognition (ASR). With …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization

Z Jin, Y Yang, M Shi, W Kang, X Yang, Z Yao… - arXiv preprint arXiv …, 2024 - arxiv.org

The evolving speech processing landscape is increasingly focused on complex scenarios
like meetings or cocktail parties with multiple simultaneous speakers and far-field conditions …

Sa-Paraformer: Non-Autoregressive End-To-End Speaker-Attributed ASR

Y Li, F Yu, Y Liang, P Guo, M Shi, Z Du… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

Joint modeling of multi-speaker ASR and speaker diarization has recently shown promising
results in speaker-attributed automatic speech recognition (SA-ASR). Although being able to …

Advancing Multi-talker ASR Performance with Large Language Models

M Shi, Z Jin, Y Xu, Y Xu, SX Zhang, K Wei… - arXiv preprint arXiv …, 2024 - arxiv.org

Recognizing overlapping speech from multiple speakers in conversational scenarios is one
of the most challenging problem for automatic speech recognition (ASR). Serialized output …

AG-LSEC: Audio Grounded Lexical Speaker Error Correction

R Paturi, X Li, S Srinivasan - arXiv preprint arXiv:2406.17266, 2024 - arxiv.org

Speaker Diarization (SD) systems are typically audio-based and operate independently of
the ASR system in traditional speech transcription pipelines and can have speaker errors …