Speech recognition using deep neural networks: A systematic review

AB Nassif, I Shahin, I Attili, M Azzeh, K Shaalan - IEEE access, 2019 - ieeexplore.ieee.org
Over the past decades, a tremendous amount of research has been done on the use of
machine learning for speech processing applications, especially speech recognition …

Recent progress in the CUHK dysarthric speech recognition system

S Liu, M Geng, S Hu, X Xie, M Cui, J Yu… - … on Audio, Speech …, 2021 - ieeexplore.ieee.org
Despite the rapid progress of automatic speech recognition (ASR) technologies in the past
few decades, recognition of disordered speech remains a highly challenging task to date …

Randaugment: Practical automated data augmentation with a reduced search space

ED Cubuk, B Zoph, J Shlens… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com
Recent work on automated augmentation strategies has led to state-of-the-art results in
image classification and object detection. An obstacle to a large-scale adoption of these …

Specaugment: A simple data augmentation method for automatic speech recognition

DS Park, W Chan, Y Zhang, CC Chiu, B Zoph… - arXiv preprint arXiv …, 2019 - arxiv.org
We present SpecAugment, a simple data augmentation method for speech recognition.
SpecAugment is applied directly to the feature inputs of a neural network (ie, filter bank …

End-to-end speech recognition: A survey

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

Deep learning for audio signal processing

H Purwins, B Li, T Virtanen, J Schlüter… - IEEE Journal of …, 2019 - ieeexplore.ieee.org
Given the recent surge in developments of deep learning, this paper provides a review of the
state-of-the-art deep learning techniques for audio signal processing. Speech, music, and …

[PDF][PDF] Audio augmentation for speech recognition.

T Ko, V Peddinti, D Povey, S Khudanpur - Interspeech, 2015 - isca-archive.org
Data augmentation is a common strategy adopted to increase the quantity of training data,
avoid overfitting and improve robustness of the models. In this paper, we investigate audio …

An analysis of environment, microphone and data simulation mismatches in robust speech recognition

E Vincent, S Watanabe, AA Nugraha, J Barker… - Computer Speech & …, 2017 - Elsevier
Speech enhancement and automatic speech recognition (ASR) are most often evaluated in
matched (or multi-condition) settings where the acoustic conditions of the training data …

Data augmentation for deep neural network acoustic modeling

X Cui, V Goel, B Kingsbury - IEEE/ACM Transactions on Audio …, 2015 - ieeexplore.ieee.org
This paper investigates data augmentation for deep neural network acoustic modeling
based on label-preserving transformations to deal with data sparsity. Two data …

Data augmenting contrastive learning of speech representations in the time domain

E Kharitonov, M Rivière, G Synnaeve… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org
Contrastive Predictive Coding (CPC), based on predicting future segments of speech from
past segments is emerging as a powerful algorithm for representation learning of speech …