M2MeT: The ICASSP 2022 multi-channel multi-party meeting transcription challenge

VarArray meets t-SOT: Advancing the state of the art of streaming distant conversational speech recognition

N Kanda, J Wu, X Wang, Z Chen, J Li… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

This paper presents a novel streaming automatic speech recognition (ASR) framework for
multi-talker overlapping speech captured by a distant microphone array with an arbitrary …

被引用次数：13 相关文章所有 3 个版本

[PDF] arxiv.org

The chime-7 dasr challenge: Distant meeting transcription with multiple devices in diverse scenarios

S Cornell, M Wiesner, S Watanabe, D Raj… - arXiv preprint arXiv …, 2023 - arxiv.org

The CHiME challenges have played a significant role in the development and evaluation of
robust automatic speech recognition (ASR) systems. We introduce the CHiME-7 distant ASR …

被引用次数：30 相关文章所有 7 个版本

[HTML] sciencedirect.com

[HTML][HTML] An experimental review of speaker diarization methods with application to two-speaker conversational telephone speech recordings

L Serafini, S Cornell, G Morrone, E Zovato… - Computer Speech & …, 2023 - Elsevier

We performed an experimental review of current diarization systems for the conversational
telephone speech (CTS) domain. In detail, we considered a total of eight different algorithms …

被引用次数：6 相关文章所有 6 个版本

[PDF] arxiv.org

Powerset multi-class cross entropy loss for neural speaker diarization

A Plaquet, H Bredin - arXiv preprint arXiv:2310.13025, 2023 - arxiv.org

Since its introduction in 2019, the whole end-to-end neural diarization (EEND) line of work
has been addressing speaker diarization as a frame-wise multi-label classification problem …

被引用次数：31 相关文章所有 10 个版本

[PDF] hal.science

pyannote. audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe

H Bredin - 24th INTERSPEECH Conference (INTERSPEECH …, 2023 - hal.science

pyannote. audio is an open-source toolkit written in Python for speaker diarization. Version
2.1 introduces a major overhaul of pyannote. audio default speaker diarization pipeline …

被引用次数：46 相关文章所有 18 个版本

[PDF] arxiv.org

GPU-accelerated guided source separation for meeting transcription

D Raj, D Povey, S Khudanpur - arXiv preprint arXiv:2212.05271, 2022 - arxiv.org

Guided source separation (GSS) is a type of target-speaker extraction method that relies on
pre-computed speaker activities and blind source separation to perform front-end …

被引用次数：29 相关文章所有 10 个版本

[PDF] neurips.cc

UNSSOR: unsupervised neural speech separation by leveraging over-determined training mixtures

ZQ Wang, S Watanabe - Advances in Neural Information …, 2024 - proceedings.neurips.cc

In reverberant conditions with multiple concurrent speakers, each microphone acquires a
mixture signal of multiple speakers at a different location. In over-determined conditions …

被引用次数：6 相关文章所有 8 个版本

[PDF] arxiv.org

Cross-channel attention-based target speaker voice activity detection: Experimental results for the m2met challenge

W Wang, X Qin, M Li - ICASSP 2022-2022 IEEE International …, 2022 - ieeexplore.ieee.org

DukeECE. As the highly overlapped speech exists in the dataset, we employ an x-vector-
based target-speaker voice activity detection (TS-VAD) to find the overlap between …

被引用次数：29 相关文章所有 6 个版本

[PDF] arxiv.org

Summary on the ICASSP 2022 multi-channel multi-party meeting transcription grand challenge

F Yu, S Zhang, P Guo, Y Fu, Z Du… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

The ICASSP 2022 Multi-channel Multi-party Meeting Transcription Grand Challenge
(M2MeT) focuses on one of the most valuable and the most challenging scenarios of speech …

被引用次数：25 相关文章所有 5 个版本

Diaper: End-to-end neural diarization with perceiver-based attractors

F Landini, T Stafylakis, L Burget - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org

Until recently, the field of speaker diarization was dominated by cascaded systems. Due to
their limitations, mainly regarding overlapped speech and cumbersome pipelines, endto …

被引用次数：4 相关文章所有 2 个版本