Prompting the hidden talent of web-scale speech models for zero-shot task generalization

P Peng, B Yan, S Watanabe, D Harwath - arXiv preprint arXiv:2305.11095, 2023 - arxiv.org
We investigate the emergent abilities of the recently proposed web-scale speech model
Whisper, by adapting it to unseen tasks with prompt engineering. We selected three tasks …

Approximate nearest neighbour phrase mining for contextual speech recognition

M Bleeker, P Swietojanski, S Braun… - arXiv preprint arXiv …, 2023 - arxiv.org
This paper presents an extension to train end-to-end Context-Aware Transformer
Transducer (CATT) models by using a simple, yet efficient method of mining hard negative …

LAE-ST-MOE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-Switching ASR

G Ma, W Wang, Y Li, Y Yang, B Du… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Recently, to mitigate the confusion between different languages in code-switching (CS)
automatic speech recognition (ASR), the conditionally factorized models, such as the …

Cross-lingual Knowledge Transfer and Iterative Pseudo-labeling for Low-Resource Speech Recognition with Transducers

J Silovsky, L Deng, A Argueta, T Arvizo, R Hsiao… - arXiv preprint arXiv …, 2023 - arxiv.org
Voice technology has become ubiquitous recently. However, the accuracy, and hence
experience, in different languages varies significantly, which makes the technology not …