An exploration of self-supervised pretrained representations for end-to-end speech recognition
Self-supervised pretraining on speech data has achieved a lot of progress. High-fidelity
representation of the speech signal is learned from a lot of untranscribed data and shows …
representation of the speech signal is learned from a lot of untranscribed data and shows …
Cascaded encoders for unifying streaming and non-streaming ASR
End-to-end (E2E) automatic speech recognition (ASR) models, by now, have shown
competitive performance on several benchmarks. These models are structured to either …
competitive performance on several benchmarks. These models are structured to either …
Deep learning based multi-source localization with source splitting and its effectiveness in multi-talker speech recognition
Multi-source localization is an important and challenging technique for multi-talker
conversation analysis. This paper proposes a novel supervised learning method using deep …
conversation analysis. This paper proposes a novel supervised learning method using deep …
Summary on the ICASSP 2022 multi-channel multi-party meeting transcription grand challenge
The ICASSP 2022 Multi-channel Multi-party Meeting Transcription Grand Challenge
(M2MeT) focuses on one of the most valuable and the most challenging scenarios of speech …
(M2MeT) focuses on one of the most valuable and the most challenging scenarios of speech …
EEND-SS: Joint end-to-end neural speaker diarization and speech separation for flexible number of speakers
In this paper, we present a novel framework that jointly performs three tasks: speaker
diarization, speech separation, and speaker counting. Our proposed framework integrates …
diarization, speech separation, and speaker counting. Our proposed framework integrates …
L-spex: Localized target speaker extraction
Speaker extraction aims to extract the target speaker's voice from a multi-talker speech
mixture given an auxiliary reference utterance. Recent studies show that speaker extraction …
mixture given an auxiliary reference utterance. Recent studies show that speaker extraction …
Single channel voice separation for unknown number of speakers under reverberant and noisy settings
We present a unified network for voice separation of an unknown number of speakers. The
proposed approach is composed of several separation heads optimized together with a …
proposed approach is composed of several separation heads optimized together with a …
Dual-path RNN for long recording speech separation
Continuous speech separation (CSS) is an arising task in speech separation aiming at
separating overlap-free targets from a long, partially-overlapped recording. A straightforward …
separating overlap-free targets from a long, partially-overlapped recording. A straightforward …
End-to-end speaker diarization conditioned on speech activity and overlap detection
In this paper, we present a conditional multitask learning method for end-to-end neural
speaker diarization (EEND). The EEND system has shown promising performance …
speaker diarization (EEND). The EEND system has shown promising performance …
Train from scratch: Single-stage joint training of speech separation and recognition
Multi-speaker speech separation and recognition gains much attention in the speech
community recently. Previously, most studies train the front-end separation module and back …
community recently. Previously, most studies train the front-end separation module and back …