Transformer-based acoustic modeling for hybrid speech recognition

S Latif, J Qadir, A Qayyum, M Usama… - IEEE Reviews in …, 2020 - ieeexplore.ieee.org

Speech technology is not appropriately explored even though modern advances in speech
technology—especially those driven by deep learning (DL) technology—offer …

被引用次数：127 相关文章所有 3 个版本

[PDF] arxiv.org

Automatic speech recognition using advanced deep learning approaches: A survey

H Kheddar, M Hemis, Y Himeur - Information Fusion, 2024 - Elsevier

Recent advancements in deep learning (DL) have posed a significant challenge for
automatic speech recognition (ASR). ASR relies on extensive training datasets, including …

被引用次数：6 相关文章所有 3 个版本

[PDF] thecvf.com

Transvg: End-to-end visual grounding with transformers

J Deng, Z Yang, T Chen, W Zhou… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

In this paper, we present a neat yet effective transformer-based framework for visual
grounding, namely TransVG, to address the task of grounding a language query to the …

被引用次数：279 相关文章所有 6 个版本

[PDF] arxiv.org

Conformer: Convolution-augmented transformer for speech recognition

A Gulati, J Qin, CC Chiu, N Parmar, Y Zhang… - arXiv preprint arXiv …, 2020 - arxiv.org

Recently Transformer and Convolution neural network (CNN) based models have shown
promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural …

被引用次数：2826 相关文章所有 12 个版本

[PDF] ieee.org

End-to-end speech recognition: A survey

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org

In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

被引用次数：80 相关文章所有 6 个版本

[PDF] arxiv.org

Transformer transducer: A streamable speech recognition model with transformer encoders and rnn-t loss

Q Zhang, H Lu, H Sak, A Tripathi… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

In this paper we present an end-to-end speech recognition model with Transformer
encoders that can be used in a streaming speech recognition system. Transformer …

被引用次数：488 相关文章所有 6 个版本

[PDF] neurips.cc

Squeezeformer: An efficient transformer for automatic speech recognition

S Kim, A Gholami, A Shaw, N Lee… - Advances in …, 2022 - proceedings.neurips.cc

The recently proposed Conformer model has become the de facto backbone model for
various downstream speech tasks based on its hybrid attention-convolution architecture that …

被引用次数：74 相关文章所有 7 个版本

[PDF] arxiv.org

Contextnet: Improving convolutional neural networks for automatic speech recognition with global context

W Han, Z Zhang, Y Zhang, J Yu, CC Chiu, J Qin… - arXiv preprint arXiv …, 2020 - arxiv.org

Convolutional neural networks (CNN) have shown promising results for end-to-end speech
recognition, albeit still behind other state-of-the-art methods in performance. In this paper …

被引用次数：290 相关文章所有 10 个版本

[PDF] arxiv.org

Developing real-time streaming transformer transducer for speech recognition on large-scale dataset

X Chen, Y Wu, Z Wang, S Liu… - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org

Recently, Transformer based end-to-end models have achieved great success in many
areas including speech recognition. However, compared to LSTM models, the heavy …

被引用次数：182 相关文章所有 3 个版本

[PDF] arxiv.org

Emformer: Efficient memory transformer based acoustic model for low latency streaming speech recognition

Y Shi, Y Wang, C Wu, CF Yeh, J Chan… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

This paper proposes an efficient memory transformer Emformer for low latency streaming
speech recognition. In Emformer, the long-range history context is distilled into an …

被引用次数：162 相关文章所有 3 个版本