A review of speaker diarization: Recent advances with deep learning

M Tanveer, A Rastogi, V Paliwal, MA Ganaie, AK Malik… - Neurocomputing, 2023 - Elsevier

Abstract Machine learning methods are extensively used for processing and analysing
speech signals by virtue of their performance gains over multiple domains. Deep learning …

被引用次数：13 相关文章所有 5 个版本

[PDF] arxiv.org

Wavlm: Large-scale self-supervised pre-training for full stack speech processing

S Chen, C Wang, Z Chen, Y Wu, S Liu… - IEEE Journal of …, 2022 - ieeexplore.ieee.org

Self-supervised learning (SSL) achieves great success in speech recognition, while limited
exploration has been attempted for other speech processing tasks. As speech signal …

被引用次数：1194 相关文章所有 5 个版本

[PDF] thecvf.com

Ego4d: Around the world in 3,000 hours of egocentric video

K Grauman, A Westbury, E Byrne… - Proceedings of the …, 2022 - openaccess.thecvf.com

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …

被引用次数：680 相关文章所有 13 个版本

[HTML] aip.org

[HTML][HTML] A survey of sound source localization with deep learning methods

PA Grumiaux, S Kitić, L Girin, A Guérin - The Journal of the Acoustical …, 2022 - pubs.aip.org

This article is a survey of deep learning methods for single and multiple sound source
localization, with a focus on sound source localization in indoor environments, where …

被引用次数：208 相关文章所有 13 个版本

[PDF] arxiv.org

The speakin system for voxceleb speaker recognition challange 2021

M Zhao, Y Ma, M Liu, M Xu - arXiv preprint arXiv:2109.01989, 2021 - arxiv.org

This report describes our submission to the track 1 and track 2 of the VoxCeleb Speaker
Recognition Challenge 2021 (VoxSRC 2021). Both track 1 and track 2 share the same …

被引用次数：79 相关文章所有 6 个版本

[PDF] arxiv.org

ECAPA-TDNN embeddings for speaker diarization

N Dawalatabad, M Ravanelli, F Grondin… - arXiv preprint arXiv …, 2021 - arxiv.org

Learning robust speaker embeddings is a crucial step in speaker diarization. Deep neural
networks can accurately capture speaker discriminative characteristics and popular deep …

被引用次数：100 相关文章所有 14 个版本

[PDF] arxiv.org

M2MeT: The ICASSP 2022 multi-channel multi-party meeting transcription challenge

F Yu, S Zhang, Y Fu, L Xie, S Zheng… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Recent development of speech signal processing, such as speech recognition, speaker
diarization, etc., has inspired numerous applications of speech technologies. The meeting …

被引用次数：70 相关文章所有 3 个版本

[HTML] sciencedirect.com

[HTML][HTML] An experimental review of speaker diarization methods with application to two-speaker conversational telephone speech recordings

L Serafini, S Cornell, G Morrone, E Zovato… - Computer Speech & …, 2023 - Elsevier

We performed an experimental review of current diarization systems for the conversational
telephone speech (CTS) domain. In detail, we considered a total of eight different algorithms …

被引用次数：6 相关文章所有 6 个版本

[PDF] hal.science

pyannote. audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe

H Bredin - 24th INTERSPEECH Conference (INTERSPEECH …, 2023 - hal.science

pyannote. audio is an open-source toolkit written in Python for speaker diarization. Version
2.1 introduces a major overhaul of pyannote. audio default speaker diarization pipeline …

被引用次数：46 相关文章所有 18 个版本

[PDF] arxiv.org

Turn-to-diarize: Online speaker diarization constrained by transformer transducer speaker turn detection

W Xia, H Lu, Q Wang, A Tripathi… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

In this paper, we present a novel speaker diarization system for streaming on-device
applications. In this system, we use a transformer transducer to detect the speaker turns …

被引用次数：59 相关文章所有 5 个版本