A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Multi-channel conversational speaker separation via neural diarization

H Taherian, DL Wang - IEEE/ACM Transactions on Audio …, 2024 - ieeexplore.ieee.org
When dealing with overlapped speech, the performance of automatic speech recognition
(ASR) systems substantially degrades as they are designed for single-talker speech. To …

Attention-Based Encoder-Decoder End-to-End Neural Diarization With Embedding Enhancer

Z Chen, B Han, S Wang, Y Qian - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org
Deep neural network-based systems have significantly improved the performance of
speaker diarization tasks. However, end-to-end neural diarization (EEND) systems often …

Enhancing speaker diarization with large language models: A contextual beam search approach

TJ Park, K Dhawan, N Koluguri… - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
Large language models (LLMs) have shown great promise for capturing contextual
information in natural language processing tasks. We propose a novel approach to speaker …

Streaming speaker-attributed ASR with token-level speaker embeddings

N Kanda, J Wu, Y Wu, X Xiao, Z Meng, X Wang… - arXiv preprint arXiv …, 2022 - arxiv.org
This paper presents a streaming speaker-attributed automatic speech recognition (SA-ASR)
model that can recognize``who spoke what''with low latency even when multiple people are …

Meeting recognition with continuous speech separation and transcription-supported diarization

T Von Neumann, C Boeddeker… - … , Speech, and Signal …, 2024 - ieeexplore.ieee.org
We propose a modular pipeline for the single-channel separation, recognition, and
diarization of meeting-style recordings and evaluate it on the Libri-CSS dataset. Using a …

One Model to Rule Them All? Towards End-to-End Joint Speaker Diarization and Speech Recognition

S Cornell, J Jung, S Watanabe… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
This paper presents a novel framework for joint speaker diarization (SD) and automatic
speech recognition (ASR), named SLIDAR (sliding-window diarization-augmented …

Diarist: Streaming Speech Translation with Speaker Diarization

M Yang, N Kanda, X Wang, J Chen… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
End-to-end speech translation (ST) for conversation recordings involves several under-
explored challenges such as speaker diarization (SD) without accurate word time stamps …

MMS-MSG: A multi-purpose multi-speaker mixture signal generator

T Cord-Landwehr, T Von Neumann… - … on Acoustic Signal …, 2022 - ieeexplore.ieee.org
The scope of speech enhancement has changed from a monolithic view of single,
independent tasks, to a joint processing of complex conversational speech recordings …

Improving speaker diarization using semantic information: Joint pairwise constraints propagation

L Cheng, S Zheng, Q Zhang, H Wang, Y Chen… - arXiv preprint arXiv …, 2023 - arxiv.org
Speaker diarization has gained considerable attention within speech processing research
community. Mainstream speaker diarization rely primarily on speakers' voice characteristics …