Neural target speech extraction: An overview

K Zmolikova, M Delcroix, T Ochiai… - IEEE Signal …, 2023 - ieeexplore.ieee.org
Humans can listen to a target speaker even in challenging acoustic conditions that have
noise, reverberation, and interfering speakers. This phenomenon is known as the cocktail …

Recent progress in the CUHK dysarthric speech recognition system

S Liu, M Geng, S Hu, X Xie, M Cui, J Yu… - … on Audio, Speech …, 2021 - ieeexplore.ieee.org
Despite the rapid progress of automatic speech recognition (ASR) technologies in the past
few decades, recognition of disordered speech remains a highly challenging task to date …

Automatic speech recognition method based on deep learning approaches for Uzbek language

A Mukhamadiyev, I Khujayarov, O Djuraev, J Cho - Sensors, 2022 - mdpi.com
Communication has been an important aspect of human life, civilization, and globalization
for thousands of years. Biometric analysis, education, security, healthcare, and smart cities …

Audio-visual end-to-end multi-channel speech separation, dereverberation and recognition

G Li, J Deng, M Geng, Z Jin, T Wang… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
Accurate recognition of cocktail party speech containing overlapping speakers, noise and
reverberation remains a highly challenging task to date. Motivated by the invariance of …

Bayesian neural network language modeling for speech recognition

B Xue, S Hu, J Xu, M Geng, X Liu… - IEEE/ACM Transactions …, 2022 - ieeexplore.ieee.org
State-of-the-art neural network language models (NNLMs) represented by long short term
memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly …

Restoring Speaking Lips from Occlusion for Audio-Visual Speech Recognition

J Wang, Z Pan, M Zhang, RT Tan, H Li - Proceedings of the AAAI …, 2024 - ojs.aaai.org
Prior studies on audio-visual speech recognition typically assume the visibility of speaking
lips, ignoring the fact that visual occlusion occurs in real-world videos, thus adversely …

End-to-end multi-modal speech recognition on an air and bone conducted speech corpus

M Wang, J Chen, XL Zhang… - IEEE/ACM Transactions …, 2022 - ieeexplore.ieee.org
Automatic speech recognition (ASR) has been significantly improved in the past years.
However, most robust ASR systems are based on air-conducted (AC) speech, and their …

Audio-visual multi-channel speech separation, dereverberation and recognition

G Li, J Yu, J Deng, X Liu, H Meng - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Despite the rapid advance of automatic speech recognition (ASR) technologies, accurate
recognition of cocktail party speech characterised by the interference from overlapping …

Multi-channel multi-speaker ASR using 3D spatial feature

Y Shao, SX Zhang, D Yu - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Automatic speech recognition (ASR) of multi-channel multi-speaker overlapped speech
remains one of the most challenging tasks to the speech community. In this paper, we look …

Mixed precision dnn quantization for overlapped speech separation and recognition

J Xu, J Yu, X Liu, H Meng - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Recognition of overlapped speech has been a highly challenging task to date. State-of-the-
art multi-channel speech separation system are becoming increasingly complex and …