Powerset multi-class cross entropy loss for neural speaker diarization
Since its introduction in 2019, the whole end-to-end neural diarization (EEND) line of work
has been addressing speaker diarization as a frame-wise multi-label classification problem …
has been addressing speaker diarization as a frame-wise multi-label classification problem …
pyannote. audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe
H Bredin - 24th INTERSPEECH Conference (INTERSPEECH …, 2023 - hal.science
pyannote. audio is an open-source toolkit written in Python for speaker diarization. Version
2.1 introduces a major overhaul of pyannote. audio default speaker diarization pipeline …
2.1 introduces a major overhaul of pyannote. audio default speaker diarization pipeline …
Optimization of RNN-based speech activity detection
G Gelly, JL Gauvain - IEEE/ACM Transactions on Audio …, 2017 - ieeexplore.ieee.org
Speech activity detection (SAD) is an essential component of automatic speech recognition
systems impacting the overall system performance. This paper investigates an optimization …
systems impacting the overall system performance. This paper investigates an optimization …
[PDF][PDF] A study on automatic speech recognition
S Benkerzaz, Y Elmir, A Dennai - Journal of Information Technology …, 2019 - academia.edu
Speech is an easy and usable technique of communication between humans, but nowadays
humans are not limited to connecting to each other but even to the different machines in our …
humans are not limited to connecting to each other but even to the different machines in our …
[PDF][PDF] pyannote. audio speaker diarization pipeline at VoxSRC 2023
This technical report describes the submission of team pyannote to the VoxSRC 2023
speaker diarization challenge. It relies on 3 stages: local end-to-end neural speaker …
speaker diarization challenge. It relies on 3 stages: local end-to-end neural speaker …
A study on automatic speech recognition systems
H Ibrahim, A Varol - … on Digital Forensics and Security (ISDFS), 2020 - ieeexplore.ieee.org
Speech recognition is a technique that enables machines to automatically identify the
human voice through speech signals. In other words, it helps create a communication link …
human voice through speech signals. In other words, it helps create a communication link …
[PDF][PDF] The first official repere evaluation
O Galibert, J Kahn - First Workshop on Speech, Language and …, 2013 - isca-archive.org
The REPERE Challenge aims to support research on people recognition in multimodal
conditions. Following a 2012 dryrun [1], the first official evaluation of systems has been …
conditions. Following a 2012 dryrun [1], the first official evaluation of systems has been …
Audio-Visual Speaker Diarization: Current Databases, Approaches and Challenges
Nowadays, the large amount of audio-visual content available has fostered the need to
develop new robust automatic speaker diarization systems to analyse and characterise it …
develop new robust automatic speaker diarization systems to analyse and characterise it …
Fabiole, a speech database for forensic speaker comparison
A speech database has been collected for use to highlight the importance of “speaker factor”
in forensic voice comparison. FABIOLE has been created during the FABIOLE project …
in forensic voice comparison. FABIOLE has been created during the FABIOLE project …
Unsupervised speaker identification in TV broadcast based on written names
Identifying speakers in TV broadcast in an unsupervised way (ie, without biometric models)
is a solution for avoiding costly annotations. Existing methods usually use pronounced …
is a solution for avoiding costly annotations. Existing methods usually use pronounced …