Guided source separation meets a strong ASR backend: Hitachi/Paderborn University joint investiga...

TJ Park, N Kanda, D Dimitriadis, KJ Han… - Computer Speech & …, 2022 - Elsevier

Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …

被引用次数：326 相关文章所有 7 个版本

[PDF] arxiv.org

End-to-end speaker diarization for an unknown number of speakers with encoder-decoder based attractors

S Horiguchi, Y Fujita, S Watanabe, Y Xue… - arXiv preprint arXiv …, 2020 - arxiv.org

End-to-end speaker diarization for an unknown number of speakers is addressed in this
paper. Recently proposed end-to-end speaker diarization outperformed conventional …

被引用次数：175 相关文章所有 11 个版本

[PDF] arxiv.org

VarArray meets t-SOT: Advancing the state of the art of streaming distant conversational speech recognition

N Kanda, J Wu, X Wang, Z Chen, J Li… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

This paper presents a novel streaming automatic speech recognition (ASR) framework for
multi-talker overlapping speech captured by a distant microphone array with an arbitrary …

被引用次数：13 相关文章所有 3 个版本

[PDF] arxiv.org

Microsoft speaker diarization system for the voxceleb speaker recognition challenge 2020

X Xiao, N Kanda, Z Chen, T Zhou… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

This paper describes the Microsoft speaker diarization system for monaural multi-talker
recordings in the wild, evaluated at the diarization track of the VoxCeleb Speaker …

被引用次数：71 相关文章所有 3 个版本

[PDF] arxiv.org

GPU-accelerated guided source separation for meeting transcription

D Raj, D Povey, S Khudanpur - arXiv preprint arXiv:2212.05271, 2022 - arxiv.org

Guided source separation (GSS) is a type of target-speaker extraction method that relies on
pre-computed speaker activities and blind source separation to perform front-end …

被引用次数：29 相关文章所有 10 个版本

[PDF] arxiv.org

Streaming multi-talker ASR with token-level serialized output training

N Kanda, J Wu, Y Wu, X Xiao, Z Meng, X Wang… - arXiv preprint arXiv …, 2022 - arxiv.org

This paper proposes a token-level serialized output training (t-SOT), a novel framework for
streaming multi-talker automatic speech recognition (ASR). Unlike existing streaming multi …

被引用次数：44 相关文章所有 6 个版本

[PDF] arxiv.org

Advances in online audio-visual meeting transcription

T Yoshioka, I Abramovski, C Aksoylar… - 2019 IEEE Automatic …, 2019 - ieeexplore.ieee.org

This paper describes a system that generates speaker-annotated transcripts of meetings by
using a microphone array and a 360-degree camera. The hallmark of the system is its ability …

被引用次数：85 相关文章所有 6 个版本

[PDF] ieee.org

Encoder-decoder based attractors for end-to-end neural diarization

S Horiguchi, Y Fujita, S Watanabe… - … /ACM Transactions on …, 2022 - ieeexplore.ieee.org

This paper investigates an end-to-end neural diarization (EEND) method for an unknown
number of speakers. In contrast to the conventional cascaded approach to speaker …

被引用次数：46 相关文章所有 6 个版本

[PDF] arxiv.org

Jointly optimal denoising, dereverberation, and source separation

T Nakatani, C Boeddeker, K Kinoshita… - … on Audio, Speech …, 2020 - ieeexplore.ieee.org

This article proposes methods that can optimize a Convolutional BeamFormer (CBF) for
jointly performing denoising, dereverberation, and source separation (DN+ DR+ SS) in a …

被引用次数：60 相关文章所有 6 个版本

[PDF] isca-archive.org

[PDF][PDF] The STC system for the CHiME-6 challenge

I Medennikov, M Korenevsky, T Prisyach… - … 2020 Workshop on …, 2020 - isca-archive.org

This paper is a description of the Speech Technology Center (STC) systems for the CHiME-6
challenge aimed at multimicrophone multi-speaker speech recognition and diarization in a …

被引用次数：60 相关文章所有 6 个版本