Summary on the ICASSP 2022 multi-channel multi-party meeting transcription grand challenge

F Yu, S Zhang, P Guo, Y Fu, Z Du… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
The ICASSP 2022 Multi-channel Multi-party Meeting Transcription Grand Challenge
(M2MeT) focuses on one of the most valuable and the most challenging scenarios of speech …

Implicit neural spatial filtering for multichannel source separation in the waveform domain

D Markovic, A Defossez, A Richard - arXiv preprint arXiv:2206.15423, 2022 - arxiv.org
We present a single-stage casual waveform-to-waveform multichannel model that can
separate moving sound sources based on their broad spatial locations in a dynamic …

L-spex: Localized target speaker extraction

M Ge, C Xu, L Wang, ES Chng… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Speaker extraction aims to extract the target speaker's voice from a multi-talker speech
mixture given an auxiliary reference utterance. Recent studies show that speaker extraction …

Desnet: A multi-channel network for simultaneous speech dereverberation, enhancement and separation

Y Fu, J Wu, Y Hu, M Xing, L Xie - 2021 IEEE Spoken Language …, 2021 - ieeexplore.ieee.org
In this paper, we propose a multi-channel network for simultaneous speech dereverberation,
enhancement and separation (DESNet). To enable gradient propagation and joint …

A comparative study on speaker-attributed automatic speech recognition in multi-party meetings

F Yu, Z Du, S Zhang, Y Lin, L Xie - arXiv preprint arXiv:2203.16834, 2022 - arxiv.org
In this paper, we conduct a comparative study on speaker-attributed automatic speech
recognition (SA-ASR) in the multi-party meeting scenario, a topic with increasing attention in …

Ba-sot: Boundary-aware serialized output training for multi-talker asr

Y Liang, F Yu, Y Li, P Guo, S Zhang, Q Chen… - arXiv preprint arXiv …, 2023 - arxiv.org
The recently proposed serialized output training (SOT) simplifies multi-talker automatic
speech recognition (ASR) by generating speaker transcriptions separated by a special …

A neural beamspace-domain filter for real-time multi-channel speech enhancement

W Liu, A Li, X Wang, M Yuan, Y Chen, C Zheng, X Li - Symmetry, 2022 - mdpi.com
Most deep-learning-based multi-channel speech enhancement methods focus on designing
a set of beamforming coefficients, to directly filter the low signal-to-noise ratio signals …

Investigation of practical aspects of single channel speech separation for ASR

J Wu, Z Chen, S Chen, Y Wu, T Yoshioka… - arXiv preprint arXiv …, 2021 - arxiv.org
Speech separation has been successfully applied as a frontend processing module of
conversation transcription systems thanks to its ability to handle overlapped speech and its …

A separation and interaction framework for causal multi-channel speech enhancement

W Liu, A Li, C Zheng, X Li - Digital Signal Processing, 2022 - Elsevier
Multi-channel speech enhancement aims at extracting the desired speech using a
microphone array, which has many potential applications, such as video conferencing …

Streaming Multi-Channel Speech Separation with Online Time-Domain Generalized Wiener Filter

Y Luo - ICASSP 2023-2023 IEEE International Conference on …, 2023 - ieeexplore.ieee.org
Most existing streaming neural-network-based multi-channel speech separation systems
consist of a causal network architecture and an online spatial information extraction module …