A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Speaker recognition based on deep learning: An overview

Z Bai, XL Zhang - Neural Networks, 2021 - Elsevier
Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …

A review of speaker diarization: Recent advances with deep learning

TJ Park, N Kanda, D Dimitriadis, KJ Han… - Computer Speech & …, 2022 - Elsevier
Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …

End-to-end speaker segmentation for overlap-aware resegmentation

H Bredin, A Laurent - arXiv preprint arXiv:2104.04045, 2021 - arxiv.org
Speaker segmentation consists in partitioning a conversation between one or more
speakers into speaker turns. Usually addressed as the late combination of three sub-tasks …

Target-speaker voice activity detection: a novel approach for multi-speaker diarization in a dinner party scenario

I Medennikov, M Korenevsky, T Prisyach… - arXiv preprint arXiv …, 2020 - arxiv.org
Speaker diarization for real-life scenarios is an extremely challenging problem. Widely used
clustering-based diarization approaches perform rather poorly in such conditions, mainly …

End-to-end speaker diarization for an unknown number of speakers with encoder-decoder based attractors

S Horiguchi, Y Fujita, S Watanabe, Y Xue… - arXiv preprint arXiv …, 2020 - arxiv.org
End-to-end speaker diarization for an unknown number of speakers is addressed in this
paper. Recently proposed end-to-end speaker diarization outperformed conventional …

Far-field automatic speech recognition

R Haeb-Umbach, J Heymann, L Drude… - Proceedings of the …, 2020 - ieeexplore.ieee.org
The machine recognition of speech spoken at a distance from the microphones, known as
far-field automatic speech recognition (ASR), has received a significant increase in attention …

The chime-7 dasr challenge: Distant meeting transcription with multiple devices in diverse scenarios

S Cornell, M Wiesner, S Watanabe, D Raj… - arXiv preprint arXiv …, 2023 - arxiv.org
The CHiME challenges have played a significant role in the development and evaluation of
robust automatic speech recognition (ASR) systems. We introduce the CHiME-7 distant ASR …

M2MeT: The ICASSP 2022 multi-channel multi-party meeting transcription challenge

F Yu, S Zhang, Y Fu, L Xie, S Zheng… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Recent development of speech signal processing, such as speech recognition, speaker
diarization, etc., has inspired numerous applications of speech technologies. The meeting …

Powerset multi-class cross entropy loss for neural speaker diarization

A Plaquet, H Bredin - arXiv preprint arXiv:2310.13025, 2023 - arxiv.org
Since its introduction in 2019, the whole end-to-end neural diarization (EEND) line of work
has been addressing speaker diarization as a frame-wise multi-label classification problem …