The third DIHARD diarization challenge

H Bredin, A Laurent - arXiv preprint arXiv:2104.04045, 2021 - arxiv.org

Speaker segmentation consists in partitioning a conversation between one or more
speakers into speaker turns. Usually addressed as the late combination of three sub-tasks …

被引用次数：201 相关文章所有 17 个版本

[PDF] arxiv.org

The speakin system for voxceleb speaker recognition challange 2021

M Zhao, Y Ma, M Liu, M Xu - arXiv preprint arXiv:2109.01989, 2021 - arxiv.org

This report describes our submission to the track 1 and track 2 of the VoxCeleb Speaker
Recognition Challenge 2021 (VoxSRC 2021). Both track 1 and track 2 share the same …

被引用次数：83 相关文章所有 6 个版本

[PDF] arxiv.org

Powerset multi-class cross entropy loss for neural speaker diarization

A Plaquet, H Bredin - arXiv preprint arXiv:2310.13025, 2023 - arxiv.org

Since its introduction in 2019, the whole end-to-end neural diarization (EEND) line of work
has been addressing speaker diarization as a frame-wise multi-label classification problem …

被引用次数：100 相关文章所有 10 个版本

[PDF] hal.science

pyannote. audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe

H Bredin - 24th INTERSPEECH Conference (INTERSPEECH …, 2023 - hal.science

pyannote. audio is an open-source toolkit written in Python for speaker diarization. Version
2.1 introduces a major overhaul of pyannote. audio default speaker diarization pipeline …

被引用次数：121 相关文章所有 18 个版本

[PDF] ieee.org

Encoder-decoder based attractors for end-to-end neural diarization

S Horiguchi, Y Fujita, S Watanabe… - … /ACM Transactions on …, 2022 - ieeexplore.ieee.org

This paper investigates an end-to-end neural diarization (EEND) method for an unknown
number of speakers. In contrast to the conventional cascaded approach to speaker …

被引用次数：67 相关文章所有 6 个版本

[PDF] aclanthology.org

How might we create better benchmarks for speech recognition?

A Aksënova, D van Esch, J Flynn… - Proceedings of the 1st …, 2021 - aclanthology.org

The applications of automatic speech recognition (ASR) systems are proliferating, in part
due to recent significant quality improvements. However, as recent work indicates, even …

被引用次数：45 相关文章所有 9 个版本

[PDF] arxiv.org

Advances in integration of end-to-end neural and clustering-based diarization for real conversational speech

K Kinoshita, M Delcroix, N Tawara - arXiv preprint arXiv:2105.09040, 2021 - arxiv.org

Recently, we proposed a novel speaker diarization method called End-to-End-Neural-
Diarization-vector clustering (EEND-vector clustering) that integrates clustering-based and …

被引用次数：62 相关文章所有 7 个版本

[PDF] arxiv.org

Diaper: End-to-end neural diarization with perceiver-based attractors

F Landini, T Stafylakis, L Burget - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org

Until recently, the field of speaker diarization was dominated by cascaded systems. Due to
their limitations, mainly regarding overlapped speech and cumbersome pipelines, end-to …

被引用次数：12 相关文章所有 2 个版本

[PDF] arxiv.org

Target-speaker voice activity detection via sequence-to-sequence prediction

M Cheng, W Wang, Y Zhang, X Qin… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Target-speaker voice activity detection is currently a promising approach for speaker
diarization in complex acoustic environments. This paper presents a novel Sequence-to …

被引用次数：34 相关文章所有 4 个版本

[PDF] arxiv.org

Probing acoustic representations for phonetic properties

D Ma, N Ryant, M Liberman - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org

Pre-trained acoustic representations such as wav2vec and DeCoAR have attained
impressive word error rates (WER) for speech recognition benchmarks, particularly when …

被引用次数：53 相关文章所有 4 个版本