Speech technology for healthcare: Opportunities, challenges, and state of the art
Speech technology is not appropriately explored even though modern advances in speech
technology—especially those driven by deep learning (DL) technology—offer …
technology—especially those driven by deep learning (DL) technology—offer …
Automatic speech recognition using advanced deep learning approaches: A survey
Recent advancements in deep learning (DL) have posed a significant challenge for
automatic speech recognition (ASR). ASR relies on extensive training datasets, including …
automatic speech recognition (ASR). ASR relies on extensive training datasets, including …
Transvg: End-to-end visual grounding with transformers
In this paper, we present a neat yet effective transformer-based framework for visual
grounding, namely TransVG, to address the task of grounding a language query to the …
grounding, namely TransVG, to address the task of grounding a language query to the …
Conformer: Convolution-augmented transformer for speech recognition
Recently Transformer and Convolution neural network (CNN) based models have shown
promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural …
promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural …
End-to-end speech recognition: A survey
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …
learning has brought considerable reductions in word error rate of more than 50% relative …
Transformer transducer: A streamable speech recognition model with transformer encoders and rnn-t loss
In this paper we present an end-to-end speech recognition model with Transformer
encoders that can be used in a streaming speech recognition system. Transformer …
encoders that can be used in a streaming speech recognition system. Transformer …
Squeezeformer: An efficient transformer for automatic speech recognition
The recently proposed Conformer model has become the de facto backbone model for
various downstream speech tasks based on its hybrid attention-convolution architecture that …
various downstream speech tasks based on its hybrid attention-convolution architecture that …
Contextnet: Improving convolutional neural networks for automatic speech recognition with global context
Convolutional neural networks (CNN) have shown promising results for end-to-end speech
recognition, albeit still behind other state-of-the-art methods in performance. In this paper …
recognition, albeit still behind other state-of-the-art methods in performance. In this paper …
Developing real-time streaming transformer transducer for speech recognition on large-scale dataset
Recently, Transformer based end-to-end models have achieved great success in many
areas including speech recognition. However, compared to LSTM models, the heavy …
areas including speech recognition. However, compared to LSTM models, the heavy …
Emformer: Efficient memory transformer based acoustic model for low latency streaming speech recognition
This paper proposes an efficient memory transformer Emformer for low latency streaming
speech recognition. In Emformer, the long-range history context is distilled into an …
speech recognition. In Emformer, the long-range history context is distilled into an …