Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition
End-to-end (E2E) systems have shown comparable performance to hybrid systems for
automatic speech recognition (ASR). Word timings, as a by-product of ASR, are essential in …
automatic speech recognition (ASR). Word timings, as a by-product of ASR, are essential in …
[PDF][PDF] Bring dialogue-context into RNN-T for streaming ASR.
Recently the conversational end-to-end (E2E) automatic speech recognition (ASR) models,
which directly integrate dialogue-context such as historical utterances into E2E models …
which directly integrate dialogue-context such as historical utterances into E2E models …
[PDF][PDF] A Neural Time Alignment Module for End-to-End Automatic Speech Recognition
D Jiang, C Zhang, PC Woodland - isca-archive.org
End-to-end trainable (E2E) automatic speech recognition (ASR) systems have low word
error rates, but they do not model timings or silence by default unlike hidden Markov model …
error rates, but they do not model timings or silence by default unlike hidden Markov model …