Transcribe-to-diarize: Neural speaker diarization for unlimited number of speakers using...

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

被引用次数：142 相关文章所有 6 个版本

[PDF] arxiv.org

Multi-channel conversational speaker separation via neural diarization

H Taherian, DL Wang - IEEE/ACM Transactions on Audio …, 2024 - ieeexplore.ieee.org

When dealing with overlapped speech, the performance of automatic speech recognition
(ASR) systems substantially degrades as they are designed for single-talker speech. To …

被引用次数：8 相关文章所有 3 个版本

[PDF] arxiv.org

Attention-Based Encoder-Decoder End-to-End Neural Diarization With Embedding Enhancer

Z Chen, B Han, S Wang, Y Qian - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org

Deep neural network-based systems have significantly improved the performance of
speaker diarization tasks. However, end-to-end neural diarization (EEND) systems often …

被引用次数：11 相关文章所有 5 个版本

[PDF] arxiv.org

Enhancing speaker diarization with large language models: A contextual beam search approach

TJ Park, K Dhawan, N Koluguri… - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org

Large language models (LLMs) have shown great promise for capturing contextual
information in natural language processing tasks. We propose a novel approach to speaker …

被引用次数：13 相关文章所有 3 个版本

[PDF] arxiv.org

Streaming speaker-attributed ASR with token-level speaker embeddings

N Kanda, J Wu, Y Wu, X Xiao, Z Meng, X Wang… - arXiv preprint arXiv …, 2022 - arxiv.org

This paper presents a streaming speaker-attributed automatic speech recognition (SA-ASR)
model that can recognize``who spoke what''with low latency even when multiple people are …

被引用次数：21 相关文章所有 6 个版本

[PDF] arxiv.org

Meeting recognition with continuous speech separation and transcription-supported diarization

T Von Neumann, C Boeddeker… - … , Speech, and Signal …, 2024 - ieeexplore.ieee.org

We propose a modular pipeline for the single-channel separation, recognition, and
diarization of meeting-style recordings and evaluate it on the Libri-CSS dataset. Using a …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

One Model to Rule Them All? Towards End-to-End Joint Speaker Diarization and Speech Recognition

S Cornell, J Jung, S Watanabe… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

This paper presents a novel framework for joint speaker diarization (SD) and automatic
speech recognition (ASR), named SLIDAR (sliding-window diarization-augmented …

被引用次数：12 相关文章所有 3 个版本

[PDF] arxiv.org

Diarist: Streaming Speech Translation with Speaker Diarization

M Yang, N Kanda, X Wang, J Chen… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

End-to-end speech translation (ST) for conversation recordings involves several under-
explored challenges such as speaker diarization (SD) without accurate word time stamps …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

MMS-MSG: A multi-purpose multi-speaker mixture signal generator

T Cord-Landwehr, T Von Neumann… - … on Acoustic Signal …, 2022 - ieeexplore.ieee.org

The scope of speech enhancement has changed from a monolithic view of single,
independent tasks, to a joint processing of complex conversational speech recordings …

被引用次数：10 相关文章所有 5 个版本

[PDF] arxiv.org

Improving speaker diarization using semantic information: Joint pairwise constraints propagation

L Cheng, S Zheng, Q Zhang, H Wang, Y Chen… - arXiv preprint arXiv …, 2023 - arxiv.org

Speaker diarization has gained considerable attention within speech processing research
community. Mainstream speaker diarization rely primarily on speakers' voice characteristics …

被引用次数：4 相关文章所有 2 个版本