Streaming end-to-end speech recognition with joint CTC-attention based models

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com

Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

被引用次数：326 相关文章所有 7 个版本

[PDF] arxiv.org

Streaming automatic speech recognition with the transformer model

N Moritz, T Hori, J Le - ICASSP 2020-2020 IEEE International …, 2020 - ieeexplore.ieee.org

Encoder-decoder based sequence-to-sequence models have demonstrated state-of-the-art
results in end-to-end automatic speech recognition (ASR). Recently, the transformer …

被引用次数：207 相关文章所有 11 个版本

[PDF] mdpi.com

Attention-inspired artificial neural networks for speech processing: A systematic review

N Zacarias-Morales, P Pancardo… - Symmetry, 2021 - mdpi.com

Artificial Neural Networks (ANNs) were created inspired by the neural networks in the
human brain and have been widely applied in speech processing. The application areas of …

被引用次数：23 相关文章所有 8 个版本

[PDF] arxiv.org

How does pre-trained wav2vec 2.0 perform on domain-shifted asr? an extensive benchmark on air traffic control communications

J Zuluaga-Gomez, A Prasad… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org

Recent work on self-supervised pre-training focus on leveraging large-scale unlabeled
speech data to build robust end-to-end (E2E) acoustic models (AM) that can be later fine …

被引用次数：36 相关文章所有 11 个版本

[PDF] arxiv.org

Spike-triggered non-autoregressive transformer for end-to-end speech recognition

Z Tian, J Yi, J Tao, Y Bai, S Zhang, Z Wen - arXiv preprint arXiv …, 2020 - arxiv.org

Non-autoregressive transformer models have achieved extremely fast inference speed and
comparable performance with autoregressive sequence-to-sequence models in neural …

被引用次数：62 相关文章所有 6 个版本

[PDF] arxiv.org

A new training pipeline for an improved neural transducer

A Zeyer, A Merboldt, R Schlüter, H Ney - arXiv preprint arXiv:2005.09319, 2020 - arxiv.org

The RNN transducer is a promising end-to-end model candidate. We compare the original
training criterion with the full marginalization over all alignments, to the commonly used …

被引用次数：56 相关文章所有 8 个版本

[PDF] arxiv.org

Injecting text in self-supervised speech pretraining

Z Chen, Y Zhang, A Rosenberg… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org

Self-supervised pretraining for Automated Speech Recognition (ASR) has shown varied
degrees of success. In this paper, we propose to jointly learn representations during …

被引用次数：34 相关文章所有 3 个版本

[PDF] arxiv.org

Advanced long-context end-to-end speech recognition using context-expanded transformers

T Hori, N Moritz, C Hori, JL Roux - arXiv preprint arXiv:2104.09426, 2021 - arxiv.org

This paper addresses end-to-end automatic speech recognition (ASR) for long audio
recordings such as lecture and conversational speeches. Most end-to-end ASR models are …

被引用次数：36 相关文章所有 6 个版本

[PDF] ieee.org

Momentum pseudo-labeling: Semi-supervised asr with continuously improving pseudo-labels

Y Higuchi, N Moritz, J Le Roux… - IEEE Journal of Selected …, 2022 - ieeexplore.ieee.org

End-to-end automatic speech recognition (ASR) has become a popular alternative to
traditional module-based systems, simplifying the model-building process with a single deep …

被引用次数：18 相关文章所有 7 个版本

[PDF] arxiv.org

Semi-supervised speech recognition via graph-based temporal classification

N Moritz, T Hori, J Le Roux - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org

Semi-supervised learning has demonstrated promising results in automatic speech
recognition (ASR) by self-training using a seed ASR model with pseudo-labels generated for …

被引用次数：32 相关文章所有 5 个版本