End-to-end speaker segmentation for overlap-aware resegmentation

H Bredin, A Laurent - arXiv preprint arXiv:2104.04045, 2021 - arxiv.org
Speaker segmentation consists in partitioning a conversation between one or more
speakers into speaker turns. Usually addressed as the late combination of three sub-tasks …

The speakin system for voxceleb speaker recognition challange 2021

M Zhao, Y Ma, M Liu, M Xu - arXiv preprint arXiv:2109.01989, 2021 - arxiv.org
This report describes our submission to the track 1 and track 2 of the VoxCeleb Speaker
Recognition Challenge 2021 (VoxSRC 2021). Both track 1 and track 2 share the same …

Powerset multi-class cross entropy loss for neural speaker diarization

A Plaquet, H Bredin - arXiv preprint arXiv:2310.13025, 2023 - arxiv.org
Since its introduction in 2019, the whole end-to-end neural diarization (EEND) line of work
has been addressing speaker diarization as a frame-wise multi-label classification problem …

pyannote. audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe

H Bredin - 24th INTERSPEECH Conference (INTERSPEECH …, 2023 - hal.science
pyannote. audio is an open-source toolkit written in Python for speaker diarization. Version
2.1 introduces a major overhaul of pyannote. audio default speaker diarization pipeline …

Encoder-decoder based attractors for end-to-end neural diarization

S Horiguchi, Y Fujita, S Watanabe… - … /ACM Transactions on …, 2022 - ieeexplore.ieee.org
This paper investigates an end-to-end neural diarization (EEND) method for an unknown
number of speakers. In contrast to the conventional cascaded approach to speaker …

How might we create better benchmarks for speech recognition?

A Aksënova, D van Esch, J Flynn… - Proceedings of the 1st …, 2021 - aclanthology.org
The applications of automatic speech recognition (ASR) systems are proliferating, in part
due to recent significant quality improvements. However, as recent work indicates, even …

Advances in integration of end-to-end neural and clustering-based diarization for real conversational speech

K Kinoshita, M Delcroix, N Tawara - arXiv preprint arXiv:2105.09040, 2021 - arxiv.org
Recently, we proposed a novel speaker diarization method called End-to-End-Neural-
Diarization-vector clustering (EEND-vector clustering) that integrates clustering-based and …

Diaper: End-to-end neural diarization with perceiver-based attractors

F Landini, T Stafylakis, L Burget - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org
Until recently, the field of speaker diarization was dominated by cascaded systems. Due to
their limitations, mainly regarding overlapped speech and cumbersome pipelines, end-to …

Target-speaker voice activity detection via sequence-to-sequence prediction

M Cheng, W Wang, Y Zhang, X Qin… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Target-speaker voice activity detection is currently a promising approach for speaker
diarization in complex acoustic environments. This paper presents a novel Sequence-to …

Probing acoustic representations for phonetic properties

D Ma, N Ryant, M Liberman - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
Pre-trained acoustic representations such as wav2vec and DeCoAR have attained
impressive word error rates (WER) for speech recognition benchmarks, particularly when …