Continuous streaming multi-talker asr with dual-path transducers

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com

Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

被引用次数：406 相关文章所有 7 个版本

[PDF] arxiv.org

Streaming multi-talker ASR with token-level serialized output training

N Kanda, J Wu, Y Wu, X Xiao, Z Meng, X Wang… - arXiv preprint arXiv …, 2022 - arxiv.org

This paper proposes a token-level serialized output training (t-SOT), a novel framework for
streaming multi-talker automatic speech recognition (ASR). Unlike existing streaming multi …

被引用次数：59 相关文章所有 6 个版本

[PDF] arxiv.org

On word error rate definitions and their efficient computation for multi-speaker speech recognition systems

T von Neumann, C Boeddeker… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

We propose a general framework to compute the word error rate (WER) of ASR systems that
process recordings containing multiple speakers at their input and that produce multiple …

被引用次数：29 相关文章所有 4 个版本

[PDF] arxiv.org

Endpoint detection for streaming end-to-end multi-talker ASR

L Lu, J Li, Y Gong - ICASSP 2022-2022 IEEE International …, 2022 - ieeexplore.ieee.org

Streaming end-to-end multi-talker speech recognition aims at transcribing the overlapped
speech from conversations or meetings with an all-neural model in a streaming fashion …

被引用次数：19 相关文章所有 3 个版本

[PDF] arxiv.org

One Model to Rule Them All? Towards End-to-End Joint Speaker Diarization and Speech Recognition

S Cornell, J Jung, S Watanabe… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

This paper presents a novel framework for joint speaker diarization (SD) and automatic
speech recognition (ASR), named SLIDAR (sliding-window diarization-augmented …

被引用次数：16 相关文章所有 3 个版本

[PDF] arxiv.org

On Speaker Attribution with SURT

D Raj, M Wiesner, M Maciejewski… - arXiv preprint arXiv …, 2024 - arxiv.org

The Streaming Unmixing and Recognition Transducer (SURT) has recently become a
popular framework for continuous, streaming, multi-talker speech recognition (ASR). With …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

Alignment-Free Training for Transducer-based Multi-Talker ASR

T Moriya, S Horiguchi, M Delcroix, R Masumura… - arXiv preprint arXiv …, 2024 - arxiv.org

Extending the RNN Transducer (RNNT) to recognize multi-talker speech is essential for
wider automatic speech recognition (ASR) applications. Multi-talker RNNT (MT-RNNT) aims …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Separator-transducer-segmenter: Streaming recognition and segmentation of multi-party speech

I Sklyar, A Piunova, C Osendorfer - arXiv preprint arXiv:2205.05199, 2022 - arxiv.org

Streaming recognition and segmentation of multi-party conversations with overlapping
speech is crucial for the next generation of voice assistant applications. In this work we …

被引用次数：7 相关文章所有 7 个版本

[PDF] arxiv.org

EEND-DEMUX: End-to-End Neural Speaker Diarization via Demultiplexed Speaker Embeddings

SH Mun, MH Han, C Moon, NS Kim - arXiv preprint arXiv:2312.06065, 2023 - arxiv.org

In recent years, there have been studies to further improve the end-to-end neural speaker
diarization (EEND) systems. This letter proposes the EEND-DEMUX model, a novel …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Directed speech separation for automatic speech recognition of long form conversational speech

R Paturi, S Srinivasan, K Kirchhoff… - arXiv preprint arXiv …, 2021 - arxiv.org

Many of the recent advances in speech separation are primarily aimed at synthetic mixtures
of short audio utterances with high degrees of overlap. Most of these approaches need an …

被引用次数：9 相关文章所有 10 个版本