A review of speaker diarization: Recent advances with deep learning

TJ Park, N Kanda, D Dimitriadis, KJ Han… - Computer Speech & …, 2022 - Elsevier
Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …

Encoder-decoder based attractors for end-to-end neural diarization

S Horiguchi, Y Fujita, S Watanabe… - … /ACM Transactions on …, 2022 - ieeexplore.ieee.org
This paper investigates an end-to-end neural diarization (EEND) method for an unknown
number of speakers. In contrast to the conventional cascaded approach to speaker …

Improving multiparty interactions with a robot using large language models

P Murali, I Steenstra, HS Yun, A Shamekhi… - Extended Abstracts of …, 2023 - dl.acm.org
Speaker diarization is a key component of systems that support multiparty interactions of co-
located users, such as meeting facilitation robots. The goal is to identify who spoke what …

Diarizationlm: Speaker diarization post-processing with large language models

Q Wang, Y Huang, G Zhao, E Clark, W Xia… - arXiv preprint arXiv …, 2024 - arxiv.org
In this paper, we introduce DiarizationLM, a framework to leverage large language models
(LLM) to post-process the outputs from a speaker diarization system. Various goals can be …

Investigation of end-to-end speaker-attributed ASR for continuous multi-talker recordings

N Kanda, X Chang, Y Gaur, X Wang… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org
Recently, an end-to-end (E2E) speaker-attributed automatic speech recognition (SA-ASR)
model was proposed as a joint model of speaker counting, speech recognition and speaker …

Transcribe-to-diarize: Neural speaker diarization for unlimited number of speakers using end-to-end speaker-attributed ASR

N Kanda, X Xiao, Y Gaur, X Wang… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
This paper presents Transcribe-to-Diarize, a new approach for neural speaker diarization
that uses an end-to-end (E2E) speaker-attributed automatic speech recognition (SA-ASR) …

Enhancing speaker diarization with large language models: A contextual beam search approach

TJ Park, K Dhawan, N Koluguri… - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
Large language models (LLMs) have shown great promise for capturing contextual
information in natural language processing tasks. We propose a novel approach to speaker …

Lexical speaker error correction: Leveraging language models for speaker diarization error correction

R Paturi, S Srinivasan, X Li - arXiv preprint arXiv:2306.09313, 2023 - arxiv.org
Speaker diarization (SD) is typically used with an automatic speech recognition (ASR)
system to ascribe speaker labels to recognized words. The conventional approach …

ASR-aware end-to-end neural diarization

A Khare, E Han, Y Yang… - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
We present a Conformer-based end-to-end neural diarization (EEND) model that uses both
acoustic input and features derived from an automatic speech recognition (ASR) model. Two …

Diarist: Streaming Speech Translation with Speaker Diarization

M Yang, N Kanda, X Wang, J Chen… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
End-to-end speech translation (ST) for conversation recordings involves several under-
explored challenges such as speaker diarization (SD) without accurate word time stamps …