Noise-robust speech recognition: A comparative analysis of LSTM and CNN approaches

N Djeffal, D Addou, H Kheddar… - 2023 2nd International …, 2023 - ieeexplore.ieee.org
2023 2nd International Conference on Electronics, Energy and …, 2023ieeexplore.ieee.org
This paper proposes two techniques of automatic speech recognition (ASR) using deep-
learning applied to the Aurora-2 dataset in mode clean and mode multi-condition on three
test that has sort of noises: subway, restaurant, and suburban noises, denoted A, B, and C
respectively. The proposed scheme has two ways of classification, the first approach
employs convolution neural networks (CNN) which receive image representation of the
speech (Spectrogram) as inputs, the second approach employs recurrent neural networks …
This paper proposes two techniques of automatic speech recognition (ASR) using deep-learning applied to the Aurora-2 dataset in mode clean and mode multi-condition on three test that has sort of noises: subway, restaurant, and suburban noises, denoted A, B, and C respectively. The proposed scheme has two ways of classification, the first approach employs convolution neural networks (CNN) which receive image representation of the speech (Spectrogram) as inputs, the second approach employs recurrent neural networks (RNN) with persistent short-term memory (LSTM) which receives Mel-frequency cepstral coefficients (MFCCs) as input features. A comparison has been conducted with a method that is based on the Hidden Markov models toolkit (HTK). Experiments shows that CNN provides better classification results than other methods in noisy environments, reaching an accuracy of 97.31% which is an improvement over the value of 90.93% and 96.31% compared to LSTM and HMM respectively in 20 dB, and accuracy of 42.28% which is an improvement over the value of 22.14% and 29.87% compared to LSTM and HMM in −5 dB.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果