A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

[HTML][HTML] An experimental review of speaker diarization methods with application to two-speaker conversational telephone speech recordings

L Serafini, S Cornell, G Morrone, E Zovato… - Computer Speech & …, 2023 - Elsevier
We performed an experimental review of current diarization systems for the conversational
telephone speech (CTS) domain. In detail, we considered a total of eight different algorithms …

Target-speaker voice activity detection via sequence-to-sequence prediction

M Cheng, W Wang, Y Zhang, X Qin… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Target-speaker voice activity detection is currently a promising approach for speaker
diarization in complex acoustic environments. This paper presents a novel Sequence-to …

Supervised hierarchical clustering using graph neural networks for speaker diarization

P Singh, A Kaul, S Ganapathy - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Conventional methods for speaker diarization involve windowing an audio file into short
segments to extract speaker embeddings, followed by an unsupervised clustering of the …

Multi-input multi-output target-speaker voice activity detection for unified, flexible, and robust audio-visual speaker diarization

M Cheng, M Li - arXiv preprint arXiv:2401.08052, 2024 - arxiv.org
Audio-visual learning has demonstrated promising results in many classical speech tasks
(eg, speech separation, automatic speech recognition, wake-word spotting). We believe that …

End-to-end Online Speaker Diarization with Target Speaker Tracking

W Wang, M Li - arXiv preprint arXiv:2310.08696, 2023 - arxiv.org
This paper proposes an online target speaker voice activity detection system for speaker
diarization tasks, which does not require a priori knowledge from the clustering-based …

[PDF][PDF] The dku-smiip diarization system for the voxceleb speaker recognition challenge 2022

W Wang, X Qin, M Cheng, Y Zhang, K Wang… - Voxsrc Workshop, 2022 - robots.ox.ac.uk
This paper discribes the DKU-SMIIP submission to the 4th track of the VoxCeleb Speaker
Recognition Challenge 2022 (VoxSRC-22). Our system contains a fused voice activity …

The dku-dukeece diarization system for the voxceleb speaker recognition challenge 2022

W Wang, X Qin, M Cheng, Y Zhang, K Wang… - arXiv preprint arXiv …, 2022 - arxiv.org
This paper discribes the DKU-DukeECE submission to the 4th track of the VoxCeleb
Speaker Recognition Challenge 2022 (VoxSRC-22). Our system contains a fused voice …

The dku-msxf diarization system for the voxceleb speaker recognition challenge 2023

M Cheng, W Wang, X Qin, Y Lin, N Jiang… - National Conference on …, 2023 - Springer
This paper describes the DKU-MSXF submission to track 4 of the VoxCeleb Speaker
Recognition Challenge 2023 (VoxSRC-23). Our system pipeline contains voice activity …

Multi-target extractor and detector for unknown-number speaker diarization

CY Cheng, HS Lee, Y Tsao… - IEEE Signal Processing …, 2023 - ieeexplore.ieee.org
Strong representations of target speakers can help extract important information about
speakers and detect corresponding temporal regions in multi-speaker conversations. In this …