Sequence to multi-sequence learning via conditional chain mapping for mixture signals

X Chang, T Maekaku, P Guo, J Shi… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org

Self-supervised pretraining on speech data has achieved a lot of progress. High-fidelity
representation of the speech signal is learned from a lot of untranscribed data and shows …

被引用次数：79 相关文章所有 9 个版本

[PDF] arxiv.org

Cascaded encoders for unifying streaming and non-streaming ASR

A Narayanan, TN Sainath, R Pang, J Yu… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

End-to-end (E2E) automatic speech recognition (ASR) models, by now, have shown
competitive performance on several benchmarks. These models are structured to either …

被引用次数：78 相关文章所有 4 个版本

[PDF] arxiv.org

Deep learning based multi-source localization with source splitting and its effectiveness in multi-talker speech recognition

AS Subramanian, C Weng, S Watanabe, M Yu… - Computer Speech & …, 2022 - Elsevier

Multi-source localization is an important and challenging technique for multi-talker
conversation analysis. This paper proposes a novel supervised learning method using deep …

被引用次数：68 相关文章所有 5 个版本

[PDF] arxiv.org

Summary on the ICASSP 2022 multi-channel multi-party meeting transcription grand challenge

F Yu, S Zhang, P Guo, Y Fu, Z Du… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

The ICASSP 2022 Multi-channel Multi-party Meeting Transcription Grand Challenge
(M2MeT) focuses on one of the most valuable and the most challenging scenarios of speech …

被引用次数：25 相关文章所有 5 个版本

[PDF] arxiv.org

EEND-SS: Joint end-to-end neural speaker diarization and speech separation for flexible number of speakers

S Maiti, Y Ueda, S Watanabe, C Zhang… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org

In this paper, we present a novel framework that jointly performs three tasks: speaker
diarization, speech separation, and speaker counting. Our proposed framework integrates …

被引用次数：20 相关文章所有 5 个版本

[PDF] arxiv.org

L-spex: Localized target speaker extraction

M Ge, C Xu, L Wang, ES Chng… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Speaker extraction aims to extract the target speaker's voice from a multi-talker speech
mixture given an auxiliary reference utterance. Recent studies show that speaker extraction …

被引用次数：19 相关文章所有 3 个版本

[PDF] arxiv.org

Single channel voice separation for unknown number of speakers under reverberant and noisy settings

SE Chazan, L Wolf, E Nachmani… - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org

We present a unified network for voice separation of an unknown number of speakers. The
proposed approach is composed of several separation heads optimized together with a …

被引用次数：27 相关文章所有 4 个版本

Dual-path RNN for long recording speech separation

C Li, Y Luo, C Han, J Li, T Yoshioka… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org

Continuous speech separation (CSS) is an arising task in speech separation aiming at
separating overlap-free targets from a long, partially-overlapped recording. A straightforward …

被引用次数：24 相关文章所有 3 个版本

[PDF] arxiv.org

End-to-end speaker diarization conditioned on speech activity and overlap detection

Y Takashima, Y Fujita, S Watanabe… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org

In this paper, we present a conditional multitask learning method for end-to-end neural
speaker diarization (EEND). The EEND system has shown promising performance …

被引用次数：22 相关文章所有 7 个版本

Train from scratch: Single-stage joint training of speech separation and recognition

J Shi, X Chang, S Watanabe, B Xu - Computer Speech & Language, 2022 - Elsevier

Multi-speaker speech separation and recognition gains much attention in the speech
community recently. Previously, most studies train the front-end separation module and back …

被引用次数：11 相关文章所有 4 个版本