Speech enhancement using end-to-end speech recognition objectives

R Haeb-Umbach, J Heymann, L Drude… - Proceedings of the …, 2020 - ieeexplore.ieee.org

The machine recognition of speech spoken at a distance from the microphones, known as
far-field automatic speech recognition (ASR), has received a significant increase in attention …

被引用次数：114 相关文章所有 8 个版本

[PDF] arxiv.org

End-to-end integration of speech recognition, speech enhancement, and self-supervised learning representation

X Chang, T Maekaku, Y Fujita, S Watanabe - arXiv preprint arXiv …, 2022 - arxiv.org

This work presents our end-to-end (E2E) automatic speech recognition (ASR) model
targetting at robust speech recognition, called Integraded speech Recognition with …

被引用次数：53 相关文章所有 8 个版本

[PDF] arxiv.org

Wav2vec-switch: Contrastive learning from original-noisy speech pairs for robust speech recognition

Y Wang, J Li, H Wang, Y Qian… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

The goal of self-supervised learning (SSL) for automatic speech recognition (ASR) is to
learn good speech representations from a large amount of unlabeled speech for the …

被引用次数：71 相关文章所有 5 个版本

[PDF] ieee.org

Remixit: Continual self-training of speech enhancement models via bootstrapped remixing

E Tzinis, Y Adi, VK Ithapu, B Xu… - IEEE Journal of …, 2022 - ieeexplore.ieee.org

We present RemixIT, a simple yet effective self-supervised method for training speech
enhancement without the need of a single isolated in-domain speech nor a noise waveform …

被引用次数：53 相关文章所有 5 个版本

[PDF] arxiv.org

ESPnet-SE: End-to-end speech enhancement and separation toolkit designed for ASR integration

C Li, J Shi, W Zhang, AS Subramanian… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org

We present ESPnet-SE, which is designed for the quick development of speech
enhancement and speech separation systems in a single framework, along with the optional …

被引用次数：86 相关文章所有 5 个版本

[PDF] arxiv.org

Interactive feature fusion for end-to-end noise-robust speech recognition

Y Hu, N Hou, C Chen, ES Chng - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org

Speech enhancement (SE) aims to suppress the additive noise from noisy speech signals to
improve the speech's perceptual quality and intelligibility. However, the over-suppression …

被引用次数：45 相关文章所有 4 个版本

[PDF] arxiv.org

The 2020 espnet update: new features, broadened applications, performance improvements, and future plans

S Watanabe, F Boyer, X Chang, P Guo… - 2021 IEEE Data …, 2021 - ieeexplore.ieee.org

This paper describes the recent development of ESPnet (https://github. com/espnet/espnet),
an end-to-end speech processing toolkit. This project was initiated in December 2017 to …

被引用次数：56 相关文章所有 7 个版本

[PDF] arxiv.org

Jointly optimal denoising, dereverberation, and source separation

T Nakatani, C Boeddeker, K Kinoshita… - … on Audio, Speech …, 2020 - ieeexplore.ieee.org

This article proposes methods that can optimize a Convolutional BeamFormer (CBF) for
jointly performing denoising, dereverberation, and source separation (DN+ DR+ SS) in a …

被引用次数：62 相关文章所有 6 个版本

[PDF] arxiv.org

Gradient remedy for multi-task learning in end-to-end noise-robust speech recognition

Y Hu, C Chen, R Li, Q Zhu… - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org

Speech enhancement (SE) is proved effective in reducing noise from noisy speech signals
for downstream automatic speech recognition (ASR), where multi-task learning strategy is …

被引用次数：22 相关文章所有 4 个版本

[PDF] uni-paderborn.de

End-to-end dereverberation, beamforming, and speech recognition in a cocktail party

W Zhang, X Chang, C Boeddeker… - … on Audio, Speech …, 2022 - ieeexplore.ieee.org

Far-field multi-speaker automatic speech recognition (ASR) has drawn increasing attention
in recent years. Most existing methods feature a signal processing frontend and an ASR …

被引用次数：17 相关文章所有 5 个版本