A review of speaker diarization: Recent advances with deep learning
Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …
Encoder-decoder based attractors for end-to-end neural diarization
This paper investigates an end-to-end neural diarization (EEND) method for an unknown
number of speakers. In contrast to the conventional cascaded approach to speaker …
number of speakers. In contrast to the conventional cascaded approach to speaker …
Improving multiparty interactions with a robot using large language models
Speaker diarization is a key component of systems that support multiparty interactions of co-
located users, such as meeting facilitation robots. The goal is to identify who spoke what …
located users, such as meeting facilitation robots. The goal is to identify who spoke what …
Diarizationlm: Speaker diarization post-processing with large language models
In this paper, we introduce DiarizationLM, a framework to leverage large language models
(LLM) to post-process the outputs from a speaker diarization system. Various goals can be …
(LLM) to post-process the outputs from a speaker diarization system. Various goals can be …
Investigation of end-to-end speaker-attributed ASR for continuous multi-talker recordings
Recently, an end-to-end (E2E) speaker-attributed automatic speech recognition (SA-ASR)
model was proposed as a joint model of speaker counting, speech recognition and speaker …
model was proposed as a joint model of speaker counting, speech recognition and speaker …
Transcribe-to-diarize: Neural speaker diarization for unlimited number of speakers using end-to-end speaker-attributed ASR
This paper presents Transcribe-to-Diarize, a new approach for neural speaker diarization
that uses an end-to-end (E2E) speaker-attributed automatic speech recognition (SA-ASR) …
that uses an end-to-end (E2E) speaker-attributed automatic speech recognition (SA-ASR) …
Enhancing speaker diarization with large language models: A contextual beam search approach
Large language models (LLMs) have shown great promise for capturing contextual
information in natural language processing tasks. We propose a novel approach to speaker …
information in natural language processing tasks. We propose a novel approach to speaker …
Lexical speaker error correction: Leveraging language models for speaker diarization error correction
Speaker diarization (SD) is typically used with an automatic speech recognition (ASR)
system to ascribe speaker labels to recognized words. The conventional approach …
system to ascribe speaker labels to recognized words. The conventional approach …
ASR-aware end-to-end neural diarization
We present a Conformer-based end-to-end neural diarization (EEND) model that uses both
acoustic input and features derived from an automatic speech recognition (ASR) model. Two …
acoustic input and features derived from an automatic speech recognition (ASR) model. Two …
Diarist: Streaming Speech Translation with Speaker Diarization
End-to-end speech translation (ST) for conversation recordings involves several under-
explored challenges such as speaker diarization (SD) without accurate word time stamps …
explored challenges such as speaker diarization (SD) without accurate word time stamps …