Multi-scale speaker embedding-based graph attention networks for speaker diarisation

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

被引用次数：154 相关文章所有 6 个版本

Graph attention-based deep embedded clustering for speaker diarization

Y Wei, H Guo, Z Ge, Z Yang - Speech Communication, 2023 - Elsevier

Deep speaker embedding extraction models have recently served as the cornerstone for
modular speaker diarization systems. However, in current modular systems, the extracted …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

In search of strong embedding extractors for speaker diarisation

J Jung, HS Heo, BJ Lee, J Huh, A Brown… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Speaker embedding extractors (EEs), which map input audio to a speaker discriminant
latent space, are of paramount importance in speaker diarisation. However, there are …

被引用次数：14 相关文章所有 9 个版本

[PDF] arxiv.org

ATGNN: Audio Tagging Graph Neural Network

S Singh, CJ Steinmetz, E Benetos… - IEEE Signal …, 2024 - ieeexplore.ieee.org

Deep learning models such as CNNs and Transformers have achieved impressive
performance for end-to-end audio tagging. Recent works have shown that despite stacking …

被引用次数：5 相关文章所有 6 个版本

[PDF] arxiv.org

EEND-DEMUX: End-to-End Neural Speaker Diarization via Demultiplexed Speaker Embeddings

SH Mun, MH Han, C Moon, NS Kim - arXiv preprint arXiv:2312.06065, 2023 - arxiv.org

In recent years, there have been studies to further improve the end-to-end neural speaker
diarization (EEND) systems. This letter proposes the EEND-DEMUX model, a novel …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Encoder-decoder multimodal speaker change detection

J Jung, S Seo, HS Heo, G Kim, YJ Kim, Y Kwon… - arXiv preprint arXiv …, 2023 - arxiv.org

The task of speaker change detection (SCD), which detects points where speakers change
in an input, is essential for several applications. Several studies solved the SCD task using …

被引用次数：4 相关文章所有 6 个版本

[PDF] arxiv.org

GIST-AiTeR system for the diarization task of the 2022 VoxCeleb speaker recognition challenge

D Park, Y Yu, KW Park, JW Kim, HK Kim - arXiv preprint arXiv:2209.10357, 2022 - arxiv.org

This report describes the submission system of the GIST-AiTeR team at the 2022 VoxCeleb
Speaker Recognition Challenge (VoxSRC) Track 4. Our system mainly includes speech …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

Geodesic interpolation of frame-wise speaker embeddings for the diarization of meeting scenarios

T Cord-Landwehr, C Boeddeker… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

We propose a modified teacher-student training for the extraction of frame-wise speaker
embeddings that allows for an effective diarization of meeting scenarios containing partially …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

High-resolution embedding extractor for speaker diarisation

HS Heo, Y Kwon, BJ Lee, YJ Kim… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Speaker embedding extractors significantly influence the performance of clustering-based
speaker diarisation systems. Conventionally, only one embedding is extracted from each …

被引用次数：3 相关文章所有 5 个版本

[PDF] arxiv.org

Absolute decision corrupts absolutely: conservative online speaker diarisation

Y Kwon, HS Heo, BJ Lee, YJ Kim… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Our focus lies in developing an online speaker diarisation framework which demonstrates
robust performance across diverse domains. In online speaker diarisation, outputs …

被引用次数：4 相关文章所有 5 个版本