Speakerbeam: Speaker aware neural network for target speaker extraction in speech mixtures

K Žmolíková, M Delcroix, K Kinoshita… - IEEE Journal of …, 2019 - ieeexplore.ieee.org
The processing of speech corrupted by interfering overlapping speakers is one of the
challenging problems with regards to today's automatic speech recognition systems …

A review on speech separation in cocktail party environment: challenges and approaches

J Agrawal, M Gupta, H Garg - Multimedia Tools and Applications, 2023 - Springer
The Cocktail party problem, which is tracing and identifying a specific speaker's speech
while numerous speakers communicate concurrently is one of the crucial problems still to be …

Combining spectral and spatial features for deep learning based blind speaker separation

ZQ Wang, DL Wang - … ACM Transactions on audio, speech, and …, 2018 - ieeexplore.ieee.org
This study tightly integrates complementary spectral and spatial features for deep learning
based multi-channel speaker separation in reverberant environments. The key idea is to …

Deep learning based phase reconstruction for speaker separation: A trigonometric perspective

ZQ Wang, K Tan, DL Wang - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org
This study investigates phase reconstruction for deep learning based monaural talker-
independent speaker separation in the short-time Fourier transform (STFT) domain. The key …

Deep extractor network for target speaker recovery from single channel speech mixtures

J Wang, J Chen, D Su, L Chen, M Yu, Y Qian… - arXiv preprint arXiv …, 2018 - arxiv.org
Speaker-aware source separation methods are promising workarounds for major difficulties
such as arbitrary source permutation and unknown number of sources. However, it remains …

A survey of unsupervised learning methods for high-dimensional uncertainty quantification in black-box-type problems

K Kontolati, D Loukrezis, DG Giovanis… - Journal of …, 2022 - Elsevier
Constructing surrogate models for uncertainty quantification (UQ) on complex partial
differential equations (PDEs) having inherently high-dimensional O (10 n), n≥ 2, stochastic …

End-to-end monaural multi-speaker ASR system without pretraining

X Chang, Y Qian, K Yu… - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org
Recently, end-to-end models have become a popular approach as an alternative to
traditional hybrid models in automatic speech recognition (ASR). The multi-speaker speech …

Audio-visual end-to-end multi-channel speech separation, dereverberation and recognition

G Li, J Deng, M Geng, Z Jin, T Wang… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
Accurate recognition of cocktail party speech containing overlapping speakers, noise and
reverberation remains a highly challenging task to date. Motivated by the invariance of …

Single-channel multi-talker speech recognition with permutation invariant training

Y Qian, X Chang, D Yu - Speech Communication, 2018 - Elsevier
Although great progress has been made in automatic speech recognition (ASR), significant
performance degradation is still observed when recognizing multi-talker mixed speech. In …

[PDF][PDF] Challenges and feasibility of automatic speech recognition for modeling student collaborative discourse in classrooms

R Southwell, S Pugh, M Perkoff, C Clevenger… - … Data Mining Society, 2022 - par.nsf.gov
Automatic speech recognition (ASR) has considerable potential to model aspects of
classroom discourse with the goals of automated assessment, feedback, and instructional …