Conv-tasnet: Surpassing ideal time–frequency magnitude masking for speech separation
Y Luo, N Mesgarani - IEEE/ACM transactions on audio, speech …, 2019 - ieeexplore.ieee.org
Single-channel, speaker-independent speech separation methods have recently seen great
progress. However, the accuracy, latency, and computational cost of such methods remain …
progress. However, the accuracy, latency, and computational cost of such methods remain …
Past review, current progress, and challenges ahead on the cocktail party problem
The cocktail party problem, ie, tracing and recognizing the speech of a specific speaker
when multiple speakers talk simultaneously, is one of the critical problems yet to be solved …
when multiple speakers talk simultaneously, is one of the critical problems yet to be solved …
Single channel target speaker extraction and recognition with speaker beam
This paper addresses the problem of single channel speech recognition of a target speaker
in a mixture of speech signals. We propose to exploit auxiliary speaker information provided …
in a mixture of speech signals. We propose to exploit auxiliary speaker information provided …
Deep extractor network for target speaker recovery from single channel speech mixtures
Speaker-aware source separation methods are promising workarounds for major difficulties
such as arbitrary source permutation and unknown number of sources. However, it remains …
such as arbitrary source permutation and unknown number of sources. However, it remains …
End-to-end monaural multi-speaker ASR system without pretraining
Recently, end-to-end models have become a popular approach as an alternative to
traditional hybrid models in automatic speech recognition (ASR). The multi-speaker speech …
traditional hybrid models in automatic speech recognition (ASR). The multi-speaker speech …
MIMO-Speech: End-to-end multi-channel multi-speaker speech recognition
Recently, the end-to-end approach has proven its efficacy in monaural multi-speaker speech
recognition. However, high word error rates (WERs) still prevent these systems from being …
recognition. However, high word error rates (WERs) still prevent these systems from being …
End-to-end dereverberation, beamforming, and speech recognition in a cocktail party
Far-field multi-speaker automatic speech recognition (ASR) has drawn increasing attention
in recent years. Most existing methods feature a signal processing frontend and an ASR …
in recent years. Most existing methods feature a signal processing frontend and an ASR …
Progressive joint modeling in unsupervised single-channel overlapped speech recognition
Unsupervised single-channel overlapped speech recognition is one of the hardest problems
in automatic speech recognition (ASR). Permutation invariant training (PIT) is a state of the …
in automatic speech recognition (ASR). Permutation invariant training (PIT) is a state of the …
Learning to enhance or not: Neural network-based switching of enhanced and observed signals for overlapping speech recognition
The combination of a deep neural network (DNN)-based speech enhancement (SE) front-
end and an automatic speech recognition (ASR) back-end is a widely used approach to …
end and an automatic speech recognition (ASR) back-end is a widely used approach to …
Multi-talker ASR for an unknown number of sources: Joint training of source counting, separation and ASR
Most approaches to multi-talker overlapped speech separation and recognition assume that
the number of simultaneously active speakers is given, but in realistic situations, it is typically …
the number of simultaneously active speakers is given, but in realistic situations, it is typically …