Vectorized Beam Search for CTC-Attention-Based Speech Recognition.

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org

In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

被引用次数：88 相关文章所有 6 个版本

[PDF] arxiv.org

ESPnet-ST: All-in-one speech translation toolkit

H Inaguma, S Kiyono, K Duh, S Karita… - arXiv preprint arXiv …, 2020 - arxiv.org

We present ESPnet-ST, which is designed for the quick development of speech-to-speech
translation systems in a single framework. ESPnet-ST is a new project inside end-to-end …

被引用次数：167 相关文章所有 6 个版本

[PDF] arxiv.org

Espresso: A fast end-to-end neural speech recognition toolkit

Y Wang, T Chen, H Xu, S Ding, H Lv… - 2019 IEEE Automatic …, 2019 - ieeexplore.ieee.org

We present Espresso, an open-source, modular, extensible end-to-end neural automatic
speech recognition (ASR) toolkit based on the deep learning library PyTorch and the …

被引用次数：88 相关文章所有 7 个版本

[PDF] arxiv.org

CTC alignments improve autoregressive translation

B Yan, S Dalmia, Y Higuchi, G Neubig, F Metze… - arXiv preprint arXiv …, 2022 - arxiv.org

Connectionist Temporal Classification (CTC) is a widely used approach for automatic
speech recognition (ASR) that performs conditionally independent monotonic alignment …

被引用次数：29 相关文章所有 5 个版本

[PDF] arxiv.org

The 2020 espnet update: new features, broadened applications, performance improvements, and future plans

S Watanabe, F Boyer, X Chang, P Guo… - 2021 IEEE Data …, 2021 - ieeexplore.ieee.org

This paper describes the recent development of ESPnet (https://github. com/espnet/espnet),
an end-to-end speech processing toolkit. This project was initiated in December 2017 to …

被引用次数：54 相关文章所有 7 个版本

[PDF] arxiv.org

Streaming transformer asr with blockwise synchronous beam search

E Tsunoo, Y Kashiwagi… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org

The Transformer self-attention network has shown promising performance as an alternative
to recurrent neural networks in end-to-end (E2E) automatic speech recognition (ASR) …

被引用次数：50 相关文章所有 6 个版本

[PDF] arxiv.org

Advanced long-context end-to-end speech recognition using context-expanded transformers

T Hori, N Moritz, C Hori, JL Roux - arXiv preprint arXiv:2104.09426, 2021 - arxiv.org

This paper addresses end-to-end automatic speech recognition (ASR) for long audio
recordings such as lecture and conversational speeches. Most end-to-end ASR models are …

被引用次数：36 相关文章所有 6 个版本

[PDF] arxiv.org

End-to-end automatic speech recognition integrated with CTC-based voice activity detection

T Yoshimura, T Hayashi, K Takeda… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

This paper integrates a voice activity detection (VAD) function with end-to-end automatic
speech recognition toward an online speech interface and transcribing very long audio …

被引用次数：48 相关文章所有 6 个版本

[PDF] mdpi.com

Improving hybrid ctc/attention architecture for agglutinative language speech recognition

Z Ren, N Yolwas, W Slamu, R Cao, H Wang - Sensors, 2022 - mdpi.com

Unlike the traditional model, the end-to-end (E2E) ASR model does not require speech
information such as a pronunciation dictionary, and its system is built through a single neural …

被引用次数：13 相关文章所有 9 个版本

[PDF] arxiv.org

Searchable hidden intermediates for end-to-end models of decomposable sequence tasks

S Dalmia, B Yan, V Raunak, F Metze… - arXiv preprint arXiv …, 2021 - arxiv.org

End-to-end approaches for sequence tasks are becoming increasingly popular. Yet for
complex sequence tasks, like speech translation, systems that cascade several models …

被引用次数：31 相关文章所有 8 个版本