End-to-end speaker segmentation for overlap-aware resegmentation
Speaker segmentation consists in partitioning a conversation between one or more
speakers into speaker turns. Usually addressed as the late combination of three sub-tasks …
speakers into speaker turns. Usually addressed as the late combination of three sub-tasks …
The speakin system for voxceleb speaker recognition challange 2021
M Zhao, Y Ma, M Liu, M Xu - arXiv preprint arXiv:2109.01989, 2021 - arxiv.org
This report describes our submission to the track 1 and track 2 of the VoxCeleb Speaker
Recognition Challenge 2021 (VoxSRC 2021). Both track 1 and track 2 share the same …
Recognition Challenge 2021 (VoxSRC 2021). Both track 1 and track 2 share the same …
Powerset multi-class cross entropy loss for neural speaker diarization
Since its introduction in 2019, the whole end-to-end neural diarization (EEND) line of work
has been addressing speaker diarization as a frame-wise multi-label classification problem …
has been addressing speaker diarization as a frame-wise multi-label classification problem …
pyannote. audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe
H Bredin - 24th INTERSPEECH Conference (INTERSPEECH …, 2023 - hal.science
pyannote. audio is an open-source toolkit written in Python for speaker diarization. Version
2.1 introduces a major overhaul of pyannote. audio default speaker diarization pipeline …
2.1 introduces a major overhaul of pyannote. audio default speaker diarization pipeline …
Encoder-decoder based attractors for end-to-end neural diarization
This paper investigates an end-to-end neural diarization (EEND) method for an unknown
number of speakers. In contrast to the conventional cascaded approach to speaker …
number of speakers. In contrast to the conventional cascaded approach to speaker …
How might we create better benchmarks for speech recognition?
A Aksënova, D van Esch, J Flynn… - Proceedings of the 1st …, 2021 - aclanthology.org
The applications of automatic speech recognition (ASR) systems are proliferating, in part
due to recent significant quality improvements. However, as recent work indicates, even …
due to recent significant quality improvements. However, as recent work indicates, even …
Advances in integration of end-to-end neural and clustering-based diarization for real conversational speech
Recently, we proposed a novel speaker diarization method called End-to-End-Neural-
Diarization-vector clustering (EEND-vector clustering) that integrates clustering-based and …
Diarization-vector clustering (EEND-vector clustering) that integrates clustering-based and …
Diaper: End-to-end neural diarization with perceiver-based attractors
Until recently, the field of speaker diarization was dominated by cascaded systems. Due to
their limitations, mainly regarding overlapped speech and cumbersome pipelines, end-to …
their limitations, mainly regarding overlapped speech and cumbersome pipelines, end-to …
Target-speaker voice activity detection via sequence-to-sequence prediction
Target-speaker voice activity detection is currently a promising approach for speaker
diarization in complex acoustic environments. This paper presents a novel Sequence-to …
diarization in complex acoustic environments. This paper presents a novel Sequence-to …
Probing acoustic representations for phonetic properties
Pre-trained acoustic representations such as wav2vec and DeCoAR have attained
impressive word error rates (WER) for speech recognition benchmarks, particularly when …
impressive word error rates (WER) for speech recognition benchmarks, particularly when …