A review of deep learning techniques for speech processing
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …
learning. The use of multiple processing layers has enabled the creation of models capable …
Graph attention-based deep embedded clustering for speaker diarization
Y Wei, H Guo, Z Ge, Z Yang - Speech Communication, 2023 - Elsevier
Deep speaker embedding extraction models have recently served as the cornerstone for
modular speaker diarization systems. However, in current modular systems, the extracted …
modular speaker diarization systems. However, in current modular systems, the extracted …
In search of strong embedding extractors for speaker diarisation
Speaker embedding extractors (EEs), which map input audio to a speaker discriminant
latent space, are of paramount importance in speaker diarisation. However, there are …
latent space, are of paramount importance in speaker diarisation. However, there are …
ATGNN: Audio Tagging Graph Neural Network
Deep learning models such as CNNs and Transformers have achieved impressive
performance for end-to-end audio tagging. Recent works have shown that despite stacking …
performance for end-to-end audio tagging. Recent works have shown that despite stacking …
EEND-DEMUX: End-to-End Neural Speaker Diarization via Demultiplexed Speaker Embeddings
In recent years, there have been studies to further improve the end-to-end neural speaker
diarization (EEND) systems. This letter proposes the EEND-DEMUX model, a novel …
diarization (EEND) systems. This letter proposes the EEND-DEMUX model, a novel …
Encoder-decoder multimodal speaker change detection
The task of speaker change detection (SCD), which detects points where speakers change
in an input, is essential for several applications. Several studies solved the SCD task using …
in an input, is essential for several applications. Several studies solved the SCD task using …
GIST-AiTeR system for the diarization task of the 2022 VoxCeleb speaker recognition challenge
This report describes the submission system of the GIST-AiTeR team at the 2022 VoxCeleb
Speaker Recognition Challenge (VoxSRC) Track 4. Our system mainly includes speech …
Speaker Recognition Challenge (VoxSRC) Track 4. Our system mainly includes speech …
Geodesic interpolation of frame-wise speaker embeddings for the diarization of meeting scenarios
T Cord-Landwehr, C Boeddeker… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
We propose a modified teacher-student training for the extraction of frame-wise speaker
embeddings that allows for an effective diarization of meeting scenarios containing partially …
embeddings that allows for an effective diarization of meeting scenarios containing partially …
High-resolution embedding extractor for speaker diarisation
Speaker embedding extractors significantly influence the performance of clustering-based
speaker diarisation systems. Conventionally, only one embedding is extracted from each …
speaker diarisation systems. Conventionally, only one embedding is extracted from each …
Absolute decision corrupts absolutely: conservative online speaker diarisation
Our focus lies in developing an online speaker diarisation framework which demonstrates
robust performance across diverse domains. In online speaker diarisation, outputs …
robust performance across diverse domains. In online speaker diarisation, outputs …