VarArray meets t-SOT: Advancing the state of the art of streaming distant conversational speech recognition
This paper presents a novel streaming automatic speech recognition (ASR) framework for
multi-talker overlapping speech captured by a distant microphone array with an arbitrary …
multi-talker overlapping speech captured by a distant microphone array with an arbitrary …
The chime-7 dasr challenge: Distant meeting transcription with multiple devices in diverse scenarios
The CHiME challenges have played a significant role in the development and evaluation of
robust automatic speech recognition (ASR) systems. We introduce the CHiME-7 distant ASR …
robust automatic speech recognition (ASR) systems. We introduce the CHiME-7 distant ASR …
[HTML][HTML] An experimental review of speaker diarization methods with application to two-speaker conversational telephone speech recordings
We performed an experimental review of current diarization systems for the conversational
telephone speech (CTS) domain. In detail, we considered a total of eight different algorithms …
telephone speech (CTS) domain. In detail, we considered a total of eight different algorithms …
Powerset multi-class cross entropy loss for neural speaker diarization
Since its introduction in 2019, the whole end-to-end neural diarization (EEND) line of work
has been addressing speaker diarization as a frame-wise multi-label classification problem …
has been addressing speaker diarization as a frame-wise multi-label classification problem …
pyannote. audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe
H Bredin - 24th INTERSPEECH Conference (INTERSPEECH …, 2023 - hal.science
pyannote. audio is an open-source toolkit written in Python for speaker diarization. Version
2.1 introduces a major overhaul of pyannote. audio default speaker diarization pipeline …
2.1 introduces a major overhaul of pyannote. audio default speaker diarization pipeline …
GPU-accelerated guided source separation for meeting transcription
Guided source separation (GSS) is a type of target-speaker extraction method that relies on
pre-computed speaker activities and blind source separation to perform front-end …
pre-computed speaker activities and blind source separation to perform front-end …
UNSSOR: unsupervised neural speech separation by leveraging over-determined training mixtures
ZQ Wang, S Watanabe - Advances in Neural Information …, 2024 - proceedings.neurips.cc
In reverberant conditions with multiple concurrent speakers, each microphone acquires a
mixture signal of multiple speakers at a different location. In over-determined conditions …
mixture signal of multiple speakers at a different location. In over-determined conditions …
Cross-channel attention-based target speaker voice activity detection: Experimental results for the m2met challenge
DukeECE. As the highly overlapped speech exists in the dataset, we employ an x-vector-
based target-speaker voice activity detection (TS-VAD) to find the overlap between …
based target-speaker voice activity detection (TS-VAD) to find the overlap between …
Summary on the ICASSP 2022 multi-channel multi-party meeting transcription grand challenge
The ICASSP 2022 Multi-channel Multi-party Meeting Transcription Grand Challenge
(M2MeT) focuses on one of the most valuable and the most challenging scenarios of speech …
(M2MeT) focuses on one of the most valuable and the most challenging scenarios of speech …
Diaper: End-to-end neural diarization with perceiver-based attractors
Until recently, the field of speaker diarization was dominated by cascaded systems. Due to
their limitations, mainly regarding overlapped speech and cumbersome pipelines, endto …
their limitations, mainly regarding overlapped speech and cumbersome pipelines, endto …