A review of deep learning techniques for speech processing
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …
learning. The use of multiple processing layers has enabled the creation of models capable …
Multi-channel conversational speaker separation via neural diarization
H Taherian, DL Wang - IEEE/ACM Transactions on Audio …, 2024 - ieeexplore.ieee.org
When dealing with overlapped speech, the performance of automatic speech recognition
(ASR) systems substantially degrades as they are designed for single-talker speech. To …
(ASR) systems substantially degrades as they are designed for single-talker speech. To …
Attention-Based Encoder-Decoder End-to-End Neural Diarization With Embedding Enhancer
Deep neural network-based systems have significantly improved the performance of
speaker diarization tasks. However, end-to-end neural diarization (EEND) systems often …
speaker diarization tasks. However, end-to-end neural diarization (EEND) systems often …
Enhancing speaker diarization with large language models: A contextual beam search approach
Large language models (LLMs) have shown great promise for capturing contextual
information in natural language processing tasks. We propose a novel approach to speaker …
information in natural language processing tasks. We propose a novel approach to speaker …
Streaming speaker-attributed ASR with token-level speaker embeddings
This paper presents a streaming speaker-attributed automatic speech recognition (SA-ASR)
model that can recognize``who spoke what''with low latency even when multiple people are …
model that can recognize``who spoke what''with low latency even when multiple people are …
Meeting recognition with continuous speech separation and transcription-supported diarization
T Von Neumann, C Boeddeker… - … , Speech, and Signal …, 2024 - ieeexplore.ieee.org
We propose a modular pipeline for the single-channel separation, recognition, and
diarization of meeting-style recordings and evaluate it on the Libri-CSS dataset. Using a …
diarization of meeting-style recordings and evaluate it on the Libri-CSS dataset. Using a …
One Model to Rule Them All? Towards End-to-End Joint Speaker Diarization and Speech Recognition
This paper presents a novel framework for joint speaker diarization (SD) and automatic
speech recognition (ASR), named SLIDAR (sliding-window diarization-augmented …
speech recognition (ASR), named SLIDAR (sliding-window diarization-augmented …
Diarist: Streaming Speech Translation with Speaker Diarization
End-to-end speech translation (ST) for conversation recordings involves several under-
explored challenges such as speaker diarization (SD) without accurate word time stamps …
explored challenges such as speaker diarization (SD) without accurate word time stamps …
MMS-MSG: A multi-purpose multi-speaker mixture signal generator
T Cord-Landwehr, T Von Neumann… - … on Acoustic Signal …, 2022 - ieeexplore.ieee.org
The scope of speech enhancement has changed from a monolithic view of single,
independent tasks, to a joint processing of complex conversational speech recordings …
independent tasks, to a joint processing of complex conversational speech recordings …
Improving speaker diarization using semantic information: Joint pairwise constraints propagation
Speaker diarization has gained considerable attention within speech processing research
community. Mainstream speaker diarization rely primarily on speakers' voice characteristics …
community. Mainstream speaker diarization rely primarily on speakers' voice characteristics …