A review of deep learning techniques for speech processing
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …
learning. The use of multiple processing layers has enabled the creation of models capable …
Online neural diarization of unlimited numbers of speakers using global and local attractors
A method to perform offline and online speaker diarization for an unlimited number of
speakers is described in this paper. End-to-end neural diarization (EEND) has achieved …
speakers is described in this paper. End-to-end neural diarization (EEND) has achieved …
Graph attention-based deep embedded clustering for speaker diarization
Y Wei, H Guo, Z Ge, Z Yang - Speech Communication, 2023 - Elsevier
Deep speaker embedding extraction models have recently served as the cornerstone for
modular speaker diarization systems. However, in current modular systems, the extracted …
modular speaker diarization systems. However, in current modular systems, the extracted …
Supervised hierarchical clustering using graph neural networks for speaker diarization
P Singh, A Kaul, S Ganapathy - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Conventional methods for speaker diarization involve windowing an audio file into short
segments to extract speaker embeddings, followed by an unsupervised clustering of the …
segments to extract speaker embeddings, followed by an unsupervised clustering of the …
From Modular to End-to-End Speaker Diarization
F Landini - arXiv preprint arXiv:2407.08752, 2024 - arxiv.org
Speaker diarization is usually referred to as the task that determines``who spoke when''in a
recording. Until a few years ago, all competitive approaches were modular. Systems based …
recording. Until a few years ago, all competitive approaches were modular. Systems based …
Overlap-aware End-to-End Supervised Hierarchical Graph Clustering for Speaker Diarization
P Singh, S Ganapathy - arXiv preprint arXiv:2401.12850, 2024 - arxiv.org
Speaker diarization, the task of segmenting an audio recording based on speaker identity,
constitutes an important speech pre-processing step for several downstream applications …
constitutes an important speech pre-processing step for several downstream applications …
End-to-end integration of speech separation and voice activity detection for low-latency diarization of telephone conversations
Recent works show that speech separation guided diarization (SSGD) is an increasingly
promising direction, mainly thanks to the recent progress in speech separation. It performs …
promising direction, mainly thanks to the recent progress in speech separation. It performs …
Speaker conditioned acoustic modeling for multi-speaker conversational ASR
SR Chetupalli, S Ganapathy - arXiv preprint arXiv:2104.01882, 2021 - arxiv.org
In this paper, we propose a novel approach for the transcription of speech conversations
with natural speaker overlap, from single channel speech recordings. The proposed model …
with natural speaker overlap, from single channel speech recordings. The proposed model …
[PDF][PDF] Graph Clustering Approaches for Speaker Diarization of Conversational Speech
P Singh - 2023 - leap.ee.iisc.ac.in
In this era of advanced machine intelligence, real-world speech applications need to be
equipped to deal with conversations involving multiple speakers. An essential first step in …
equipped to deal with conversations involving multiple speakers. An essential first step in …
Advancing Deep-Generated Speech and Defending against Its Misuse
Z Cai - 2023 - search.proquest.com
Deep learning has revolutionized speech generation, spanning synthesis areas such as text-
to-speech and voice conversion, leading to diverse advancements. On the one hand, when …
to-speech and voice conversion, leading to diverse advancements. On the one hand, when …