A review of deep learning techniques for speech processing
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …
learning. The use of multiple processing layers has enabled the creation of models capable …
[HTML][HTML] An experimental review of speaker diarization methods with application to two-speaker conversational telephone speech recordings
We performed an experimental review of current diarization systems for the conversational
telephone speech (CTS) domain. In detail, we considered a total of eight different algorithms …
telephone speech (CTS) domain. In detail, we considered a total of eight different algorithms …
Target-speaker voice activity detection via sequence-to-sequence prediction
Target-speaker voice activity detection is currently a promising approach for speaker
diarization in complex acoustic environments. This paper presents a novel Sequence-to …
diarization in complex acoustic environments. This paper presents a novel Sequence-to …
Supervised hierarchical clustering using graph neural networks for speaker diarization
P Singh, A Kaul, S Ganapathy - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Conventional methods for speaker diarization involve windowing an audio file into short
segments to extract speaker embeddings, followed by an unsupervised clustering of the …
segments to extract speaker embeddings, followed by an unsupervised clustering of the …
Multi-input multi-output target-speaker voice activity detection for unified, flexible, and robust audio-visual speaker diarization
Audio-visual learning has demonstrated promising results in many classical speech tasks
(eg, speech separation, automatic speech recognition, wake-word spotting). We believe that …
(eg, speech separation, automatic speech recognition, wake-word spotting). We believe that …
End-to-end Online Speaker Diarization with Target Speaker Tracking
This paper proposes an online target speaker voice activity detection system for speaker
diarization tasks, which does not require a priori knowledge from the clustering-based …
diarization tasks, which does not require a priori knowledge from the clustering-based …
[PDF][PDF] The dku-smiip diarization system for the voxceleb speaker recognition challenge 2022
This paper discribes the DKU-SMIIP submission to the 4th track of the VoxCeleb Speaker
Recognition Challenge 2022 (VoxSRC-22). Our system contains a fused voice activity …
Recognition Challenge 2022 (VoxSRC-22). Our system contains a fused voice activity …
The dku-dukeece diarization system for the voxceleb speaker recognition challenge 2022
This paper discribes the DKU-DukeECE submission to the 4th track of the VoxCeleb
Speaker Recognition Challenge 2022 (VoxSRC-22). Our system contains a fused voice …
Speaker Recognition Challenge 2022 (VoxSRC-22). Our system contains a fused voice …
The dku-msxf diarization system for the voxceleb speaker recognition challenge 2023
This paper describes the DKU-MSXF submission to track 4 of the VoxCeleb Speaker
Recognition Challenge 2023 (VoxSRC-23). Our system pipeline contains voice activity …
Recognition Challenge 2023 (VoxSRC-23). Our system pipeline contains voice activity …
Multi-target extractor and detector for unknown-number speaker diarization
Strong representations of target speakers can help extract important information about
speakers and detect corresponding temporal regions in multi-speaker conversations. In this …
speakers and detect corresponding temporal regions in multi-speaker conversations. In this …