Multi-output RNN-T joint networks for multi-task learning of ASR and auxiliary tasks
We propose a multi-output joint network architecture for RNN-T transducer, for multi-task
modeling of ASR and auxiliary tasks that rely on ASR outputs. Each output of the joint …
modeling of ASR and auxiliary tasks that rely on ASR outputs. Each output of the joint …
Text Injection for Capitalization and Turn-Taking Prediction in Speech Models
Text injection for automatic speech recognition (ASR), wherein unpaired text-only data is
used to supplement paired audio-text data, has shown promising improvements for word …
used to supplement paired audio-text data, has shown promising improvements for word …