Quartznet: Deep automatic speech recognition with 1d time-channel separable convolutions

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

被引用次数：97 相关文章所有 6 个版本

Speech technology for healthcare: Opportunities, challenges, and state of the art

S Latif, J Qadir, A Qayyum, M Usama… - IEEE Reviews in …, 2020 - ieeexplore.ieee.org

Speech technology is not appropriately explored even though modern advances in speech
technology—especially those driven by deep learning (DL) technology—offer …

被引用次数：127 相关文章所有 3 个版本

[PDF] arxiv.org

SpeechBrain: A general-purpose speech toolkit

M Ravanelli, T Parcollet, P Plantinga, A Rouhe… - arXiv preprint arXiv …, 2021 - arxiv.org

SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the
research and development of neural speech processing technologies by being simple …

被引用次数：567 相关文章所有 5 个版本

[PDF] arxiv.org

Conformer: Convolution-augmented transformer for speech recognition

A Gulati, J Qin, CC Chiu, N Parmar, Y Zhang… - arXiv preprint arXiv …, 2020 - arxiv.org

Recently Transformer and Convolution neural network (CNN) based models have shown
promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural …

被引用次数：2778 相关文章所有 12 个版本

[PDF] inaoep.mx

Automatic speech recognition: a survey

M Malik, MK Malik, K Mehmood… - Multimedia Tools and …, 2021 - Springer

Recently great strides have been made in the field of automatic speech recognition (ASR) by
using various deep learning techniques. In this study, we present a thorough comparison …

被引用次数：271 相关文章所有 8 个版本

[PDF] neurips.cc

Squeezeformer: An efficient transformer for automatic speech recognition

S Kim, A Gholami, A Shaw, N Lee… - Advances in …, 2022 - proceedings.neurips.cc

The recently proposed Conformer model has become the de facto backbone model for
various downstream speech tasks based on its hybrid attention-convolution architecture that …

被引用次数：70 相关文章所有 7 个版本

[PDF] arxiv.org

Contextnet: Improving convolutional neural networks for automatic speech recognition with global context

W Han, Z Zhang, Y Zhang, J Yu, CC Chiu, J Qin… - arXiv preprint arXiv …, 2020 - arxiv.org

Convolutional neural networks (CNN) have shown promising results for end-to-end speech
recognition, albeit still behind other state-of-the-art methods in performance. In this paper …

被引用次数：288 相关文章所有 10 个版本

[PDF] arxiv.org

Intermediate loss regularization for ctc-based speech recognition

J Lee, S Watanabe - ICASSP 2021-2021 IEEE International …, 2021 - ieeexplore.ieee.org

We present a simple and efficient auxiliary loss function for automatic speech recognition
(ASR) based on the connectionist temporal classification (CTC) objective. The proposed …

被引用次数：123 相关文章所有 5 个版本

[PDF] arxiv.org

Titanet: Neural model for speaker representation with 1d depth-wise separable convolutions and global context

NR Koluguri, T Park, B Ginsburg - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org

In this paper, we propose TitaNet, a novel neural network architecture for extracting speaker
representations. We employ 1D depth-wise separable convolutions with Squeeze-and …

被引用次数：78 相关文章所有 2 个版本

[PDF] thecvf.com

Audio-visual efficient conformer for robust speech recognition

M Burchi, R Timofte - Proceedings of the IEEE/CVF Winter …, 2023 - openaccess.thecvf.com

Abstract End-to-end Automatic Speech Recognition (ASR) systems based on neural
networks have seen large improvements in recent years. The availability of large scale hand …

被引用次数：24 相关文章所有 5 个版本