End-to-end speech recognition: A survey
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …
learning has brought considerable reductions in word error rate of more than 50% relative …
ESPnet-ST: All-in-one speech translation toolkit
We present ESPnet-ST, which is designed for the quick development of speech-to-speech
translation systems in a single framework. ESPnet-ST is a new project inside end-to-end …
translation systems in a single framework. ESPnet-ST is a new project inside end-to-end …
Espresso: A fast end-to-end neural speech recognition toolkit
We present Espresso, an open-source, modular, extensible end-to-end neural automatic
speech recognition (ASR) toolkit based on the deep learning library PyTorch and the …
speech recognition (ASR) toolkit based on the deep learning library PyTorch and the …
CTC alignments improve autoregressive translation
Connectionist Temporal Classification (CTC) is a widely used approach for automatic
speech recognition (ASR) that performs conditionally independent monotonic alignment …
speech recognition (ASR) that performs conditionally independent monotonic alignment …
The 2020 espnet update: new features, broadened applications, performance improvements, and future plans
This paper describes the recent development of ESPnet (https://github. com/espnet/espnet),
an end-to-end speech processing toolkit. This project was initiated in December 2017 to …
an end-to-end speech processing toolkit. This project was initiated in December 2017 to …
Streaming transformer asr with blockwise synchronous beam search
E Tsunoo, Y Kashiwagi… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org
The Transformer self-attention network has shown promising performance as an alternative
to recurrent neural networks in end-to-end (E2E) automatic speech recognition (ASR) …
to recurrent neural networks in end-to-end (E2E) automatic speech recognition (ASR) …
Advanced long-context end-to-end speech recognition using context-expanded transformers
This paper addresses end-to-end automatic speech recognition (ASR) for long audio
recordings such as lecture and conversational speeches. Most end-to-end ASR models are …
recordings such as lecture and conversational speeches. Most end-to-end ASR models are …
End-to-end automatic speech recognition integrated with CTC-based voice activity detection
This paper integrates a voice activity detection (VAD) function with end-to-end automatic
speech recognition toward an online speech interface and transcribing very long audio …
speech recognition toward an online speech interface and transcribing very long audio …
Improving hybrid ctc/attention architecture for agglutinative language speech recognition
Z Ren, N Yolwas, W Slamu, R Cao, H Wang - Sensors, 2022 - mdpi.com
Unlike the traditional model, the end-to-end (E2E) ASR model does not require speech
information such as a pronunciation dictionary, and its system is built through a single neural …
information such as a pronunciation dictionary, and its system is built through a single neural …
Searchable hidden intermediates for end-to-end models of decomposable sequence tasks
End-to-end approaches for sequence tasks are becoming increasingly popular. Yet for
complex sequence tasks, like speech translation, systems that cascade several models …
complex sequence tasks, like speech translation, systems that cascade several models …