A review of speaker diarization: Recent advances with deep learning

TJ Park, N Kanda, D Dimitriadis, KJ Han… - Computer Speech & …, 2022 - Elsevier
Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …

Bigssl: Exploring the frontier of large-scale semi-supervised learning for automatic speech recognition

Y Zhang, DS Park, W Han, J Qin… - IEEE Journal of …, 2022 - ieeexplore.ieee.org
We summarize the results of a host of efforts using giant automatic speech recognition (ASR)
models pre-trained using large, diverse unlabeled datasets containing approximately a …

CHiME-6 challenge: Tackling multispeaker speech recognition for unsegmented recordings

S Watanabe, M Mandel, J Barker, E Vincent… - arXiv preprint arXiv …, 2020 - arxiv.org
Following the success of the 1st, 2nd, 3rd, 4th and 5th CHiME challenges we organize the
6th CHiME Speech Separation and Recognition Challenge (CHiME-6). The new challenge …

Speechstew: Simply mix all available speech recognition data to train one large neural network

W Chan, D Park, C Lee, Y Zhang, Q Le… - arXiv preprint arXiv …, 2021 - arxiv.org
We present SpeechStew, a speech recognition model that is trained on a combination of
various publicly available speech recognition datasets: AMI, Broadcast News, Common …

End-to-end neural speaker diarization with self-attention

Y Fujita, N Kanda, S Horiguchi, Y Xue… - 2019 IEEE Automatic …, 2019 - ieeexplore.ieee.org
Speaker diarization has been mainly developed based on the clustering of speaker
embeddings. However, the clustering-based approach has two major problems; ie,(i) it is not …

End-to-end neural speaker diarization with permutation-free objectives

Y Fujita, N Kanda, S Horiguchi, K Nagamatsu… - arXiv preprint arXiv …, 2019 - arxiv.org
In this paper, we propose a novel end-to-end neural-network-based speaker diarization
method. Unlike most existing methods, our proposed method does not have separate …

The chime-7 dasr challenge: Distant meeting transcription with multiple devices in diverse scenarios

S Cornell, M Wiesner, S Watanabe, D Raj… - arXiv preprint arXiv …, 2023 - arxiv.org
The CHiME challenges have played a significant role in the development and evaluation of
robust automatic speech recognition (ASR) systems. We introduce the CHiME-7 distant ASR …

Far-field automatic speech recognition

R Haeb-Umbach, J Heymann, L Drude… - Proceedings of the …, 2020 - ieeexplore.ieee.org
The machine recognition of speech spoken at a distance from the microphones, known as
far-field automatic speech recognition (ASR), has received a significant increase in attention …

Rethinking evaluation in asr: Are our models robust enough?

T Likhomanenko, Q Xu, V Pratap, P Tomasello… - arXiv preprint arXiv …, 2020 - arxiv.org
Is pushing numbers on a single benchmark valuable in automatic speech recognition?
Research results in acoustic modeling are typically evaluated based on performance on a …

Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis

D Raj, P Denisov, Z Chen, H Erdogan… - 2021 IEEE spoken …, 2021 - ieeexplore.ieee.org
Multi-speaker speech recognition of unsegmented recordings has diverse applications such
as meeting transcription and automatic subtitle generation. With technical advances in …