A review of speaker diarization: Recent advances with deep learning
Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …
End-to-end speaker diarization for an unknown number of speakers with encoder-decoder based attractors
End-to-end speaker diarization for an unknown number of speakers is addressed in this
paper. Recently proposed end-to-end speaker diarization outperformed conventional …
paper. Recently proposed end-to-end speaker diarization outperformed conventional …
VarArray meets t-SOT: Advancing the state of the art of streaming distant conversational speech recognition
This paper presents a novel streaming automatic speech recognition (ASR) framework for
multi-talker overlapping speech captured by a distant microphone array with an arbitrary …
multi-talker overlapping speech captured by a distant microphone array with an arbitrary …
Microsoft speaker diarization system for the voxceleb speaker recognition challenge 2020
This paper describes the Microsoft speaker diarization system for monaural multi-talker
recordings in the wild, evaluated at the diarization track of the VoxCeleb Speaker …
recordings in the wild, evaluated at the diarization track of the VoxCeleb Speaker …
GPU-accelerated guided source separation for meeting transcription
Guided source separation (GSS) is a type of target-speaker extraction method that relies on
pre-computed speaker activities and blind source separation to perform front-end …
pre-computed speaker activities and blind source separation to perform front-end …
Streaming multi-talker ASR with token-level serialized output training
This paper proposes a token-level serialized output training (t-SOT), a novel framework for
streaming multi-talker automatic speech recognition (ASR). Unlike existing streaming multi …
streaming multi-talker automatic speech recognition (ASR). Unlike existing streaming multi …
Advances in online audio-visual meeting transcription
T Yoshioka, I Abramovski, C Aksoylar… - 2019 IEEE Automatic …, 2019 - ieeexplore.ieee.org
This paper describes a system that generates speaker-annotated transcripts of meetings by
using a microphone array and a 360-degree camera. The hallmark of the system is its ability …
using a microphone array and a 360-degree camera. The hallmark of the system is its ability …
Encoder-decoder based attractors for end-to-end neural diarization
This paper investigates an end-to-end neural diarization (EEND) method for an unknown
number of speakers. In contrast to the conventional cascaded approach to speaker …
number of speakers. In contrast to the conventional cascaded approach to speaker …
Jointly optimal denoising, dereverberation, and source separation
This article proposes methods that can optimize a Convolutional BeamFormer (CBF) for
jointly performing denoising, dereverberation, and source separation (DN+ DR+ SS) in a …
jointly performing denoising, dereverberation, and source separation (DN+ DR+ SS) in a …
[PDF][PDF] The STC system for the CHiME-6 challenge
I Medennikov, M Korenevsky, T Prisyach… - … 2020 Workshop on …, 2020 - isca-archive.org
This paper is a description of the Speech Technology Center (STC) systems for the CHiME-6
challenge aimed at multimicrophone multi-speaker speech recognition and diarization in a …
challenge aimed at multimicrophone multi-speaker speech recognition and diarization in a …