Multi-channel speech recognition: LSTMs all the way through

Z Zhang, J Geiger, J Pohjalainen, AED Mousa… - ACM Transactions on …, 2018 - dl.acm.org

Eliminating the negative effect of non-stationary environmental noise is a long-standing
research topic for automatic speech recognition but still remains an important challenge …

被引用次数：391 相关文章所有 10 个版本

[PDF] arxiv.org

Far-field automatic speech recognition

R Haeb-Umbach, J Heymann, L Drude… - Proceedings of the …, 2020 - ieeexplore.ieee.org

The machine recognition of speech spoken at a distance from the microphones, known as
far-field automatic speech recognition (ASR), has received a significant increase in attention …

被引用次数：101 相关文章所有 8 个版本

[PDF] arxiv.org

Internal language model estimation for domain-adaptive end-to-end speech recognition

Z Meng, S Parthasarathy, E Sun, Y Gaur… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org

The external language models (LM) integration remains a challenging task for end-to-end
(E2E) automatic speech recognition (ASR) which has no clear division between acoustic …

被引用次数：102 相关文章所有 5 个版本

[PDF] arxiv.org

FaSNet: Low-latency adaptive beamforming for multi-microphone audio processing

Y Luo, C Han, N Mesgarani, E Ceolini… - 2019 IEEE automatic …, 2019 - ieeexplore.ieee.org

Beamforming has been extensively investigated for multi-channel audio processing tasks.
Recently, learning-based beamforming methods, sometimes called neural beamformers …

被引用次数：149 相关文章所有 6 个版本

[PDF] ieee.org

Neural spectrospatial filtering

K Tan, ZQ Wang, DL Wang - IEEE/ACM Transactions on Audio …, 2022 - ieeexplore.ieee.org

As the most widely-used spatial filtering approach for multi-channel speech separation,
beamforming extracts the target speech signal arriving from a specific direction. An …

被引用次数：52 相关文章所有 5 个版本

[PDF] arxiv.org

Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation

WN Hsu, Y Zhang, J Glass - 2017 IEEE automatic speech …, 2017 - ieeexplore.ieee.org

Domain mismatch between training and testing can lead to significant degradation in
performance in many machine learning scenarios. Unfortunately, this is not a rare situation …

被引用次数：164 相关文章所有 10 个版本

[PDF] arxiv.org

LCANet: End-to-end lipreading with cascaded attention-CTC

K Xu, D Li, N Cassimatis, X Wang - 2018 13th IEEE …, 2018 - ieeexplore.ieee.org

Machine lipreading is a special type of automatic speech recognition (ASR) which
transcribes human speech by visually interpreting the movement of related face regions …

被引用次数：131 相关文章所有 6 个版本

[PDF] arxiv.org

Speaker-invariant training via adversarial learning

Z Meng, J Li, Z Chen, Y Zhao, V Mazalov… - … , Speech and Signal …, 2018 - ieeexplore.ieee.org

We propose a novel adversarial multi-task learning scheme, aiming at actively curtailing the
inter-talker feature variability while maximizing its senone discriminability so as to enhance …

被引用次数：135 相关文章所有 6 个版本

[PDF] arxiv.org

Conditional teacher-student learning

Z Meng, J Li, Y Zhao, Y Gong - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org

The teacher-student (T/S) learning has been shown to be effective for a variety of problems
such as domain adaptation and model compression. One shortcoming of the T/S learning is …

被引用次数：105 相关文章所有 6 个版本

[PDF] arxiv.org

Internal language model training for domain-adaptive end-to-end speech recognition

Z Meng, N Kanda, Y Gaur… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

The efficacy of external language model (LM) integration with existing end-to-end (E2E)
automatic speech recognition (ASR) systems can be improved significantly using the …

被引用次数：51 相关文章所有 4 个版本