A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Speech technology for healthcare: Opportunities, challenges, and state of the art

S Latif, J Qadir, A Qayyum, M Usama… - IEEE Reviews in …, 2020 - ieeexplore.ieee.org
Speech technology is not appropriately explored even though modern advances in speech
technology—especially those driven by deep learning (DL) technology—offer …

SpeechBrain: A general-purpose speech toolkit

M Ravanelli, T Parcollet, P Plantinga, A Rouhe… - arXiv preprint arXiv …, 2021 - arxiv.org
SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the
research and development of neural speech processing technologies by being simple …

Conformer: Convolution-augmented transformer for speech recognition

A Gulati, J Qin, CC Chiu, N Parmar, Y Zhang… - arXiv preprint arXiv …, 2020 - arxiv.org
Recently Transformer and Convolution neural network (CNN) based models have shown
promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural …

Automatic speech recognition: a survey

M Malik, MK Malik, K Mehmood… - Multimedia Tools and …, 2021 - Springer
Recently great strides have been made in the field of automatic speech recognition (ASR) by
using various deep learning techniques. In this study, we present a thorough comparison …

Squeezeformer: An efficient transformer for automatic speech recognition

S Kim, A Gholami, A Shaw, N Lee… - Advances in …, 2022 - proceedings.neurips.cc
The recently proposed Conformer model has become the de facto backbone model for
various downstream speech tasks based on its hybrid attention-convolution architecture that …

Contextnet: Improving convolutional neural networks for automatic speech recognition with global context

W Han, Z Zhang, Y Zhang, J Yu, CC Chiu, J Qin… - arXiv preprint arXiv …, 2020 - arxiv.org
Convolutional neural networks (CNN) have shown promising results for end-to-end speech
recognition, albeit still behind other state-of-the-art methods in performance. In this paper …

Intermediate loss regularization for ctc-based speech recognition

J Lee, S Watanabe - ICASSP 2021-2021 IEEE International …, 2021 - ieeexplore.ieee.org
We present a simple and efficient auxiliary loss function for automatic speech recognition
(ASR) based on the connectionist temporal classification (CTC) objective. The proposed …

Titanet: Neural model for speaker representation with 1d depth-wise separable convolutions and global context

NR Koluguri, T Park, B Ginsburg - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
In this paper, we propose TitaNet, a novel neural network architecture for extracting speaker
representations. We employ 1D depth-wise separable convolutions with Squeeze-and …

Audio-visual efficient conformer for robust speech recognition

M Burchi, R Timofte - Proceedings of the IEEE/CVF Winter …, 2023 - openaccess.thecvf.com
Abstract End-to-end Automatic Speech Recognition (ASR) systems based on neural
networks have seen large improvements in recent years. The availability of large scale hand …