Audio-visual multi-channel integration and recognition of overlapped speech

K Zmolikova, M Delcroix, T Ochiai… - IEEE Signal …, 2023 - ieeexplore.ieee.org

Humans can listen to a target speaker even in challenging acoustic conditions that have
noise, reverberation, and interfering speakers. This phenomenon is known as the cocktail …

被引用次数：80 相关文章所有 5 个版本

[PDF] arxiv.org

Recent progress in the CUHK dysarthric speech recognition system

S Liu, M Geng, S Hu, X Xie, M Cui, J Yu… - … on Audio, Speech …, 2021 - ieeexplore.ieee.org

Despite the rapid progress of automatic speech recognition (ASR) technologies in the past
few decades, recognition of disordered speech remains a highly challenging task to date …

被引用次数：74 相关文章所有 8 个版本

[PDF] mdpi.com

Automatic speech recognition method based on deep learning approaches for Uzbek language

A Mukhamadiyev, I Khujayarov, O Djuraev, J Cho - Sensors, 2022 - mdpi.com

Communication has been an important aspect of human life, civilization, and globalization
for thousands of years. Biometric analysis, education, security, healthcare, and smart cities …

被引用次数：73 相关文章所有 10 个版本

[PDF] arxiv.org

Audio-visual end-to-end multi-channel speech separation, dereverberation and recognition

G Li, J Deng, M Geng, Z Jin, T Wang… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org

Accurate recognition of cocktail party speech containing overlapping speakers, noise and
reverberation remains a highly challenging task to date. Motivated by the invariance of …

被引用次数：13 相关文章所有 6 个版本

[PDF] arxiv.org

Bayesian neural network language modeling for speech recognition

B Xue, S Hu, J Xu, M Geng, X Liu… - IEEE/ACM Transactions …, 2022 - ieeexplore.ieee.org

State-of-the-art neural network language models (NNLMs) represented by long short term
memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly …

被引用次数：22 相关文章所有 5 个版本

[PDF] aaai.org

Restoring Speaking Lips from Occlusion for Audio-Visual Speech Recognition

J Wang, Z Pan, M Zhang, RT Tan, H Li - Proceedings of the AAAI …, 2024 - ojs.aaai.org

Prior studies on audio-visual speech recognition typically assume the visibility of speaking
lips, ignoring the fact that visual occlusion occurs in real-world videos, thus adversely …

被引用次数：8 相关文章

[PDF] xiaolei-zhang.net

End-to-end multi-modal speech recognition on an air and bone conducted speech corpus

M Wang, J Chen, XL Zhang… - IEEE/ACM Transactions …, 2022 - ieeexplore.ieee.org

Automatic speech recognition (ASR) has been significantly improved in the past years.
However, most robust ASR systems are based on air-conducted (AC) speech, and their …

被引用次数：19 相关文章所有 3 个版本

[PDF] arxiv.org

Audio-visual multi-channel speech separation, dereverberation and recognition

G Li, J Yu, J Deng, X Liu, H Meng - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org

Despite the rapid advance of automatic speech recognition (ASR) technologies, accurate
recognition of cocktail party speech characterised by the interference from overlapping …

被引用次数：11 相关文章所有 4 个版本

[PDF] arxiv.org

Multi-channel multi-speaker ASR using 3D spatial feature

Y Shao, SX Zhang, D Yu - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org

Automatic speech recognition (ASR) of multi-channel multi-speaker overlapped speech
remains one of the most challenging tasks to the speech community. In this paper, we look …

被引用次数：15 相关文章所有 3 个版本

[PDF] arxiv.org

Mixed precision dnn quantization for overlapped speech separation and recognition

J Xu, J Yu, X Liu, H Meng - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org

Recognition of overlapped speech has been a highly challenging task to date. State-of-the-
art multi-channel speech separation system are becoming increasingly complex and …

被引用次数：12 相关文章所有 4 个版本