Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition

X Chen, YY Lin, K Wang, Y He, Z Ma - arXiv preprint arXiv:2306.07949, 2023 - arxiv.org
End-to-end (E2E) systems have shown comparable performance to hybrid systems for
automatic speech recognition (ASR). Word timings, as a by-product of ASR, are essential in …

[PDF][PDF] Bring dialogue-context into RNN-T for streaming ASR.

J Hou, J Chen, W Li, Y Tang, J Zhang, Z Ma - INTERSPEECH, 2022 - isca-archive.org
Recently the conversational end-to-end (E2E) automatic speech recognition (ASR) models,
which directly integrate dialogue-context such as historical utterances into E2E models …

[PDF][PDF] A Neural Time Alignment Module for End-to-End Automatic Speech Recognition

D Jiang, C Zhang, PC Woodland - isca-archive.org
End-to-end trainable (E2E) automatic speech recognition (ASR) systems have low word
error rates, but they do not model timings or silence by default unlike hidden Markov model …