A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Graph attention-based deep embedded clustering for speaker diarization

Y Wei, H Guo, Z Ge, Z Yang - Speech Communication, 2023 - Elsevier
Deep speaker embedding extraction models have recently served as the cornerstone for
modular speaker diarization systems. However, in current modular systems, the extracted …

In search of strong embedding extractors for speaker diarisation

J Jung, HS Heo, BJ Lee, J Huh, A Brown… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Speaker embedding extractors (EEs), which map input audio to a speaker discriminant
latent space, are of paramount importance in speaker diarisation. However, there are …

ATGNN: Audio Tagging Graph Neural Network

S Singh, CJ Steinmetz, E Benetos… - IEEE Signal …, 2024 - ieeexplore.ieee.org
Deep learning models such as CNNs and Transformers have achieved impressive
performance for end-to-end audio tagging. Recent works have shown that despite stacking …

EEND-DEMUX: End-to-End Neural Speaker Diarization via Demultiplexed Speaker Embeddings

SH Mun, MH Han, C Moon, NS Kim - arXiv preprint arXiv:2312.06065, 2023 - arxiv.org
In recent years, there have been studies to further improve the end-to-end neural speaker
diarization (EEND) systems. This letter proposes the EEND-DEMUX model, a novel …

Encoder-decoder multimodal speaker change detection

J Jung, S Seo, HS Heo, G Kim, YJ Kim, Y Kwon… - arXiv preprint arXiv …, 2023 - arxiv.org
The task of speaker change detection (SCD), which detects points where speakers change
in an input, is essential for several applications. Several studies solved the SCD task using …

GIST-AiTeR system for the diarization task of the 2022 VoxCeleb speaker recognition challenge

D Park, Y Yu, KW Park, JW Kim, HK Kim - arXiv preprint arXiv:2209.10357, 2022 - arxiv.org
This report describes the submission system of the GIST-AiTeR team at the 2022 VoxCeleb
Speaker Recognition Challenge (VoxSRC) Track 4. Our system mainly includes speech …

Geodesic interpolation of frame-wise speaker embeddings for the diarization of meeting scenarios

T Cord-Landwehr, C Boeddeker… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
We propose a modified teacher-student training for the extraction of frame-wise speaker
embeddings that allows for an effective diarization of meeting scenarios containing partially …

High-resolution embedding extractor for speaker diarisation

HS Heo, Y Kwon, BJ Lee, YJ Kim… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Speaker embedding extractors significantly influence the performance of clustering-based
speaker diarisation systems. Conventionally, only one embedding is extracted from each …

Absolute decision corrupts absolutely: conservative online speaker diarisation

Y Kwon, HS Heo, BJ Lee, YJ Kim… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Our focus lies in developing an online speaker diarisation framework which demonstrates
robust performance across diverse domains. In online speaker diarisation, outputs …