An analysis of environment, microphone and data simulation mismatches in robust speech recognition

Z Zhang, J Geiger, J Pohjalainen, AED Mousa… - ACM Transactions on …, 2018 - dl.acm.org

Eliminating the negative effect of non-stationary environmental noise is a long-standing
research topic for automatic speech recognition but still remains an important challenge …

被引用次数：392 相关文章所有 10 个版本

[PDF] ieee.org

Adaptation algorithms for neural network-based speech recognition: An overview

P Bell, J Fainberg, O Klejch, J Li… - IEEE Open Journal …, 2020 - ieeexplore.ieee.org

We present a structured overview of adaptation algorithms for neural network-based speech
recognition, considering both hybrid hidden Markov model/neural network systems and end …

被引用次数：87 相关文章所有 7 个版本

[PDF] arxiv.org

Conditional diffusion probabilistic model for speech enhancement

YJ Lu, ZQ Wang, S Watanabe… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Speech enhancement is a critical component of many user-oriented audio applications, yet
current systems still suffer from distorted and unnatural outputs. While generative models …

被引用次数：126 相关文章所有 7 个版本

[PDF] arxiv.org

CHiME-6 challenge: Tackling multispeaker speech recognition for unsegmented recordings

S Watanabe, M Mandel, J Barker, E Vincent… - arXiv preprint arXiv …, 2020 - arxiv.org

Following the success of the 1st, 2nd, 3rd, 4th and 5th CHiME challenges we organize the
6th CHiME Speech Separation and Recognition Challenge (CHiME-6). The new challenge …

被引用次数：306 相关文章所有 7 个版本

[PDF] merl.com

Hybrid CTC/attention architecture for end-to-end speech recognition

S Watanabe, T Hori, S Kim, JR Hershey… - IEEE Journal of …, 2017 - ieeexplore.ieee.org

Conventional automatic speech recognition (ASR) based on a hidden Markov model
(HMM)/deep neural network (DNN) is a very complicated system consisting of various …

被引用次数：875 相关文章所有 8 个版本

[PDF] arxiv.org

Joint CTC-attention based end-to-end speech recognition using multi-task learning

S Kim, T Hori, S Watanabe - 2017 IEEE international …, 2017 - ieeexplore.ieee.org

Recently, there has been an increasing interest in end-to-end speech recognition that
directly transcribes speech to text without any predefined alignments. One approach is the …

被引用次数：1044 相关文章所有 17 个版本

[PDF] arxiv.org

Robust self-supervised audio-visual speech recognition

B Shi, WN Hsu, A Mohamed - arXiv preprint arXiv:2201.01763, 2022 - arxiv.org

Audio-based automatic speech recognition (ASR) degrades significantly in noisy
environments and is particularly vulnerable to interfering speech, as the model cannot …

被引用次数：98 相关文章所有 5 个版本

[PDF] arxiv.org

The fifth'CHiME'speech separation and recognition challenge: dataset, task and baselines

J Barker, S Watanabe, E Vincent, J Trmal - arXiv preprint arXiv …, 2018 - arxiv.org

The CHiME challenge series aims to advance robust automatic speech recognition (ASR)
technology by promoting research at the interface of speech and language processing …

被引用次数：406 相关文章所有 11 个版本

[PDF] ieee.org

Complex spectral mapping for single-and multi-channel speech enhancement and robust ASR

ZQ Wang, P Wang, DL Wang - IEEE/ACM transactions on …, 2020 - ieeexplore.ieee.org

This study proposes a complex spectral mapping approach for single-and multi-channel
speech enhancement, where deep neural networks (DNNs) are used to predict the real and …

被引用次数：192 相关文章所有 12 个版本

[PDF] arxiv.org

ESPnet: End-to-end speech processing toolkit

S Watanabe, T Hori, S Karita, T Hayashi… - arXiv preprint arXiv …, 2018 - arxiv.org

This paper introduces a new open source platform for end-to-end speech processing named
ESPnet. ESPnet mainly focuses on end-to-end automatic speech recognition (ASR), and …

被引用次数：1541 相关文章所有 15 个版本