Multi-output RNN-T joint networks for multi-task learning of ASR and auxiliary tasks

W Wang, D Zhao, S Ding, H Zhang… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
We propose a multi-output joint network architecture for RNN-T transducer, for multi-task
modeling of ASR and auxiliary tasks that rely on ASR outputs. Each output of the joint …

Text Injection for Capitalization and Turn-Taking Prediction in Speech Models

S Bijwadia, S Chang, W Wang, Z Meng… - arXiv preprint arXiv …, 2023 - arxiv.org
Text injection for automatic speech recognition (ASR), wherein unpaired text-only data is
used to supplement paired audio-text data, has shown promising improvements for word …