End-to-end neural speaker diarization with self-attention

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

被引用次数：118 相关文章所有 6 个版本

[PDF] arxiv.org

Speaker recognition based on deep learning: An overview

Z Bai, XL Zhang - Neural Networks, 2021 - Elsevier

Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …

被引用次数：358 相关文章所有 9 个版本

[PDF] arxiv.org

A review of speaker diarization: Recent advances with deep learning

TJ Park, N Kanda, D Dimitriadis, KJ Han… - Computer Speech & …, 2022 - Elsevier

Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …

被引用次数：339 相关文章所有 7 个版本

[PDF] arxiv.org

End-to-end speaker segmentation for overlap-aware resegmentation

H Bredin, A Laurent - arXiv preprint arXiv:2104.04045, 2021 - arxiv.org

Speaker segmentation consists in partitioning a conversation between one or more
speakers into speaker turns. Usually addressed as the late combination of three sub-tasks …

被引用次数：159 相关文章所有 17 个版本

[PDF] arxiv.org

Target-speaker voice activity detection: a novel approach for multi-speaker diarization in a dinner party scenario

I Medennikov, M Korenevsky, T Prisyach… - arXiv preprint arXiv …, 2020 - arxiv.org

Speaker diarization for real-life scenarios is an extremely challenging problem. Widely used
clustering-based diarization approaches perform rather poorly in such conditions, mainly …

被引用次数：195 相关文章所有 10 个版本

[PDF] arxiv.org

End-to-end speaker diarization for an unknown number of speakers with encoder-decoder based attractors

S Horiguchi, Y Fujita, S Watanabe, Y Xue… - arXiv preprint arXiv …, 2020 - arxiv.org

End-to-end speaker diarization for an unknown number of speakers is addressed in this
paper. Recently proposed end-to-end speaker diarization outperformed conventional …

被引用次数：179 相关文章所有 11 个版本

[PDF] arxiv.org

Far-field automatic speech recognition

R Haeb-Umbach, J Heymann, L Drude… - Proceedings of the …, 2020 - ieeexplore.ieee.org

The machine recognition of speech spoken at a distance from the microphones, known as
far-field automatic speech recognition (ASR), has received a significant increase in attention …

被引用次数：106 相关文章所有 8 个版本

[PDF] arxiv.org

The chime-7 dasr challenge: Distant meeting transcription with multiple devices in diverse scenarios

S Cornell, M Wiesner, S Watanabe, D Raj… - arXiv preprint arXiv …, 2023 - arxiv.org

The CHiME challenges have played a significant role in the development and evaluation of
robust automatic speech recognition (ASR) systems. We introduce the CHiME-7 distant ASR …

被引用次数：33 相关文章所有 7 个版本

[PDF] arxiv.org

M2MeT: The ICASSP 2022 multi-channel multi-party meeting transcription challenge

F Yu, S Zhang, Y Fu, L Xie, S Zheng… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Recent development of speech signal processing, such as speech recognition, speaker
diarization, etc., has inspired numerous applications of speech technologies. The meeting …

被引用次数：74 相关文章所有 3 个版本

[PDF] arxiv.org

Powerset multi-class cross entropy loss for neural speaker diarization

A Plaquet, H Bredin - arXiv preprint arXiv:2310.13025, 2023 - arxiv.org

Since its introduction in 2019, the whole end-to-end neural diarization (EEND) line of work
has been addressing speaker diarization as a frame-wise multi-label classification problem …

被引用次数：36 相关文章所有 10 个版本