A review of deep learning techniques for speech processing
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …
learning. The use of multiple processing layers has enabled the creation of models capable …
Speech technology for healthcare: Opportunities, challenges, and state of the art
Speech technology is not appropriately explored even though modern advances in speech
technology—especially those driven by deep learning (DL) technology—offer …
technology—especially those driven by deep learning (DL) technology—offer …
SpeechBrain: A general-purpose speech toolkit
SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the
research and development of neural speech processing technologies by being simple …
research and development of neural speech processing technologies by being simple …
Conformer: Convolution-augmented transformer for speech recognition
Recently Transformer and Convolution neural network (CNN) based models have shown
promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural …
promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural …
Automatic speech recognition: a survey
Recently great strides have been made in the field of automatic speech recognition (ASR) by
using various deep learning techniques. In this study, we present a thorough comparison …
using various deep learning techniques. In this study, we present a thorough comparison …
Squeezeformer: An efficient transformer for automatic speech recognition
The recently proposed Conformer model has become the de facto backbone model for
various downstream speech tasks based on its hybrid attention-convolution architecture that …
various downstream speech tasks based on its hybrid attention-convolution architecture that …
Contextnet: Improving convolutional neural networks for automatic speech recognition with global context
Convolutional neural networks (CNN) have shown promising results for end-to-end speech
recognition, albeit still behind other state-of-the-art methods in performance. In this paper …
recognition, albeit still behind other state-of-the-art methods in performance. In this paper …
Intermediate loss regularization for ctc-based speech recognition
J Lee, S Watanabe - ICASSP 2021-2021 IEEE International …, 2021 - ieeexplore.ieee.org
We present a simple and efficient auxiliary loss function for automatic speech recognition
(ASR) based on the connectionist temporal classification (CTC) objective. The proposed …
(ASR) based on the connectionist temporal classification (CTC) objective. The proposed …
Titanet: Neural model for speaker representation with 1d depth-wise separable convolutions and global context
In this paper, we propose TitaNet, a novel neural network architecture for extracting speaker
representations. We employ 1D depth-wise separable convolutions with Squeeze-and …
representations. We employ 1D depth-wise separable convolutions with Squeeze-and …
Audio-visual efficient conformer for robust speech recognition
Abstract End-to-end Automatic Speech Recognition (ASR) systems based on neural
networks have seen large improvements in recent years. The availability of large scale hand …
networks have seen large improvements in recent years. The availability of large scale hand …