Conv-tasnet: Surpassing ideal time–frequency magnitude masking for speech separation

Y Luo, N Mesgarani - IEEE/ACM transactions on audio, speech …, 2019 - ieeexplore.ieee.org
Single-channel, speaker-independent speech separation methods have recently seen great
progress. However, the accuracy, latency, and computational cost of such methods remain …

Past review, current progress, and challenges ahead on the cocktail party problem

Y Qian, C Weng, X Chang, S Wang, D Yu - Frontiers of Information …, 2018 - Springer
The cocktail party problem, ie, tracing and recognizing the speech of a specific speaker
when multiple speakers talk simultaneously, is one of the critical problems yet to be solved …

Single channel target speaker extraction and recognition with speaker beam

M Delcroix, K Zmolikova, K Kinoshita… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org
This paper addresses the problem of single channel speech recognition of a target speaker
in a mixture of speech signals. We propose to exploit auxiliary speaker information provided …

Deep extractor network for target speaker recovery from single channel speech mixtures

J Wang, J Chen, D Su, L Chen, M Yu, Y Qian… - arXiv preprint arXiv …, 2018 - arxiv.org
Speaker-aware source separation methods are promising workarounds for major difficulties
such as arbitrary source permutation and unknown number of sources. However, it remains …

End-to-end monaural multi-speaker ASR system without pretraining

X Chang, Y Qian, K Yu… - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org
Recently, end-to-end models have become a popular approach as an alternative to
traditional hybrid models in automatic speech recognition (ASR). The multi-speaker speech …

MIMO-Speech: End-to-end multi-channel multi-speaker speech recognition

X Chang, W Zhang, Y Qian, J Le Roux… - 2019 IEEE Automatic …, 2019 - ieeexplore.ieee.org
Recently, the end-to-end approach has proven its efficacy in monaural multi-speaker speech
recognition. However, high word error rates (WERs) still prevent these systems from being …

End-to-end dereverberation, beamforming, and speech recognition in a cocktail party

W Zhang, X Chang, C Boeddeker… - … on Audio, Speech …, 2022 - ieeexplore.ieee.org
Far-field multi-speaker automatic speech recognition (ASR) has drawn increasing attention
in recent years. Most existing methods feature a signal processing frontend and an ASR …

Progressive joint modeling in unsupervised single-channel overlapped speech recognition

Z Chen, J Droppo, J Li, W Xiong - IEEE/ACM Transactions on …, 2017 - ieeexplore.ieee.org
Unsupervised single-channel overlapped speech recognition is one of the hardest problems
in automatic speech recognition (ASR). Permutation invariant training (PIT) is a state of the …

Learning to enhance or not: Neural network-based switching of enhanced and observed signals for overlapping speech recognition

H Sato, T Ochiai, M Delcroix, K Kinoshita… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
The combination of a deep neural network (DNN)-based speech enhancement (SE) front-
end and an automatic speech recognition (ASR) back-end is a widely used approach to …

Multi-talker ASR for an unknown number of sources: Joint training of source counting, separation and ASR

T von Neumann, C Boeddeker, L Drude… - arXiv preprint arXiv …, 2020 - arxiv.org
Most approaches to multi-talker overlapped speech separation and recognition assume that
the number of simultaneously active speakers is given, but in realistic situations, it is typically …