Powerset multi-class cross entropy loss for neural speaker diarization

A Plaquet, H Bredin - arXiv preprint arXiv:2310.13025, 2023 - arxiv.org
Since its introduction in 2019, the whole end-to-end neural diarization (EEND) line of work
has been addressing speaker diarization as a frame-wise multi-label classification problem …

pyannote. audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe

H Bredin - 24th INTERSPEECH Conference (INTERSPEECH …, 2023 - hal.science
pyannote. audio is an open-source toolkit written in Python for speaker diarization. Version
2.1 introduces a major overhaul of pyannote. audio default speaker diarization pipeline …

Optimization of RNN-based speech activity detection

G Gelly, JL Gauvain - IEEE/ACM Transactions on Audio …, 2017 - ieeexplore.ieee.org
Speech activity detection (SAD) is an essential component of automatic speech recognition
systems impacting the overall system performance. This paper investigates an optimization …

[PDF][PDF] A study on automatic speech recognition

S Benkerzaz, Y Elmir, A Dennai - Journal of Information Technology …, 2019 - academia.edu
Speech is an easy and usable technique of communication between humans, but nowadays
humans are not limited to connecting to each other but even to the different machines in our …

[PDF][PDF] pyannote. audio speaker diarization pipeline at VoxSRC 2023

S Baroudi, H Bredin, A Plaquet, T Pellegrini - The VoxCeleb Speaker …, 2023 - mmai.io
This technical report describes the submission of team pyannote to the VoxSRC 2023
speaker diarization challenge. It relies on 3 stages: local end-to-end neural speaker …

A study on automatic speech recognition systems

H Ibrahim, A Varol - … on Digital Forensics and Security (ISDFS), 2020 - ieeexplore.ieee.org
Speech recognition is a technique that enables machines to automatically identify the
human voice through speech signals. In other words, it helps create a communication link …

[PDF][PDF] The first official repere evaluation

O Galibert, J Kahn - First Workshop on Speech, Language and …, 2013 - isca-archive.org
The REPERE Challenge aims to support research on people recognition in multimodal
conditions. Following a 2012 dryrun [1], the first official evaluation of systems has been …

Audio-Visual Speaker Diarization: Current Databases, Approaches and Challenges

V Mingote, A Ortega, A Miguel, E Lleida - arXiv preprint arXiv:2409.05659, 2024 - arxiv.org
Nowadays, the large amount of audio-visual content available has fostered the need to
develop new robust automatic speaker diarization systems to analyse and characterise it …

Fabiole, a speech database for forensic speaker comparison

M Ajili, JF Bonastre, J Kahn, S Rossato… - Proceedings of the …, 2016 - aclanthology.org
A speech database has been collected for use to highlight the importance of “speaker factor”
in forensic voice comparison. FABIOLE has been created during the FABIOLE project …

Unsupervised speaker identification in TV broadcast based on written names

J Poignant, L Besacier, G Quénot - IEEE/ACM Transactions on …, 2014 - ieeexplore.ieee.org
Identifying speakers in TV broadcast in an unsupervised way (ie, without biometric models)
is a solution for avoiding costly annotations. Existing methods usually use pronounced …