A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Online neural diarization of unlimited numbers of speakers using global and local attractors

S Horiguchi, S Watanabe, P García… - … on Audio, Speech …, 2022 - ieeexplore.ieee.org
A method to perform offline and online speaker diarization for an unlimited number of
speakers is described in this paper. End-to-end neural diarization (EEND) has achieved …

Graph attention-based deep embedded clustering for speaker diarization

Y Wei, H Guo, Z Ge, Z Yang - Speech Communication, 2023 - Elsevier
Deep speaker embedding extraction models have recently served as the cornerstone for
modular speaker diarization systems. However, in current modular systems, the extracted …

Supervised hierarchical clustering using graph neural networks for speaker diarization

P Singh, A Kaul, S Ganapathy - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Conventional methods for speaker diarization involve windowing an audio file into short
segments to extract speaker embeddings, followed by an unsupervised clustering of the …

From Modular to End-to-End Speaker Diarization

F Landini - arXiv preprint arXiv:2407.08752, 2024 - arxiv.org
Speaker diarization is usually referred to as the task that determines``who spoke when''in a
recording. Until a few years ago, all competitive approaches were modular. Systems based …

Overlap-aware End-to-End Supervised Hierarchical Graph Clustering for Speaker Diarization

P Singh, S Ganapathy - arXiv preprint arXiv:2401.12850, 2024 - arxiv.org
Speaker diarization, the task of segmenting an audio recording based on speaker identity,
constitutes an important speech pre-processing step for several downstream applications …

End-to-end integration of speech separation and voice activity detection for low-latency diarization of telephone conversations

G Morrone, S Cornell, L Serafini, E Zovato, A Brutti… - Speech …, 2024 - Elsevier
Recent works show that speech separation guided diarization (SSGD) is an increasingly
promising direction, mainly thanks to the recent progress in speech separation. It performs …

Speaker conditioned acoustic modeling for multi-speaker conversational ASR

SR Chetupalli, S Ganapathy - arXiv preprint arXiv:2104.01882, 2021 - arxiv.org
In this paper, we propose a novel approach for the transcription of speech conversations
with natural speaker overlap, from single channel speech recordings. The proposed model …

[PDF][PDF] Graph Clustering Approaches for Speaker Diarization of Conversational Speech

P Singh - 2023 - leap.ee.iisc.ac.in
In this era of advanced machine intelligence, real-world speech applications need to be
equipped to deal with conversations involving multiple speakers. An essential first step in …

Advancing Deep-Generated Speech and Defending against Its Misuse

Z Cai - 2023 - search.proquest.com
Deep learning has revolutionized speech generation, spanning synthesis areas such as text-
to-speech and voice conversion, leading to diverse advancements. On the one hand, when …