Far-field automatic speech recognition
The machine recognition of speech spoken at a distance from the microphones, known as
far-field automatic speech recognition (ASR), has received a significant increase in attention …
far-field automatic speech recognition (ASR), has received a significant increase in attention …
End-to-end integration of speech recognition, speech enhancement, and self-supervised learning representation
This work presents our end-to-end (E2E) automatic speech recognition (ASR) model
targetting at robust speech recognition, called Integraded speech Recognition with …
targetting at robust speech recognition, called Integraded speech Recognition with …
Wav2vec-switch: Contrastive learning from original-noisy speech pairs for robust speech recognition
The goal of self-supervised learning (SSL) for automatic speech recognition (ASR) is to
learn good speech representations from a large amount of unlabeled speech for the …
learn good speech representations from a large amount of unlabeled speech for the …
Remixit: Continual self-training of speech enhancement models via bootstrapped remixing
We present RemixIT, a simple yet effective self-supervised method for training speech
enhancement without the need of a single isolated in-domain speech nor a noise waveform …
enhancement without the need of a single isolated in-domain speech nor a noise waveform …
ESPnet-SE: End-to-end speech enhancement and separation toolkit designed for ASR integration
We present ESPnet-SE, which is designed for the quick development of speech
enhancement and speech separation systems in a single framework, along with the optional …
enhancement and speech separation systems in a single framework, along with the optional …
Interactive feature fusion for end-to-end noise-robust speech recognition
Speech enhancement (SE) aims to suppress the additive noise from noisy speech signals to
improve the speech's perceptual quality and intelligibility. However, the over-suppression …
improve the speech's perceptual quality and intelligibility. However, the over-suppression …
The 2020 espnet update: new features, broadened applications, performance improvements, and future plans
This paper describes the recent development of ESPnet (https://github. com/espnet/espnet),
an end-to-end speech processing toolkit. This project was initiated in December 2017 to …
an end-to-end speech processing toolkit. This project was initiated in December 2017 to …
Jointly optimal denoising, dereverberation, and source separation
This article proposes methods that can optimize a Convolutional BeamFormer (CBF) for
jointly performing denoising, dereverberation, and source separation (DN+ DR+ SS) in a …
jointly performing denoising, dereverberation, and source separation (DN+ DR+ SS) in a …
Gradient remedy for multi-task learning in end-to-end noise-robust speech recognition
Speech enhancement (SE) is proved effective in reducing noise from noisy speech signals
for downstream automatic speech recognition (ASR), where multi-task learning strategy is …
for downstream automatic speech recognition (ASR), where multi-task learning strategy is …
End-to-end dereverberation, beamforming, and speech recognition in a cocktail party
Far-field multi-speaker automatic speech recognition (ASR) has drawn increasing attention
in recent years. Most existing methods feature a signal processing frontend and an ASR …
in recent years. Most existing methods feature a signal processing frontend and an ASR …