Cwcl: Cross-modal transfer with continuously weighted contrastive loss

RS Srinivasa, J Cho, C Yang… - Advances in …, 2023 - proceedings.neurips.cc
This paper considers contrastive training for cross-modal 0-shot transfer wherein a pre-
trained model in one modality is used for representation learning in another domain using …

SLUE phase-2: A benchmark suite of diverse spoken language understanding tasks

S Shon, S Arora, CJ Lin, A Pasad, F Wu… - arXiv preprint arXiv …, 2022 - arxiv.org
Spoken language understanding (SLU) tasks have been studied for many decades in the
speech research community, but have not received as much attention as lower-level tasks …

[PDF][PDF] Whislu: End-to-end spoken language understanding with whisper

M Wang, Y Li, J Guo, X Qiao, Z Li, H Shang… - Proc …, 2023 - isca-archive.org
Abstract Spoken Language Understanding (SLU) systems commonly use cascading
structures. However, these systems are prone to error propagation, information loss, high …

Universlu: Universal spoken language understanding for diverse classification and sequence generation tasks with a single network

S Arora, H Futami, J Jung, Y Peng, R Sharma… - arXiv preprint arXiv …, 2023 - arxiv.org
Recent studies have demonstrated promising outcomes by employing large language
models with multi-tasking capabilities. They utilize prompts to guide the model's behavior …

A comparative study on e-branchformer vs conformer in speech recognition, translation, and understanding tasks

Y Peng, K Kim, F Wu, B Yan, S Arora, W Chen… - arXiv preprint arXiv …, 2023 - arxiv.org
Conformer, a convolution-augmented Transformer variant, has become the de facto encoder
architecture for speech processing due to its superior performance in various tasks …

Integrating pretrained asr and lm to perform sequence generation for spoken language understanding

S Arora, H Futami, Y Kashiwagi, E Tsunoo… - arXiv preprint arXiv …, 2023 - arxiv.org
There has been an increased interest in the integration of pretrained speech recognition
(ASR) and language models (LM) into the SLU framework. However, prior methods often …

Deliberation model for on-device spoken language understanding

D Le, A Shrivastava, P Tomasello, S Kim… - arXiv preprint arXiv …, 2022 - arxiv.org
We propose a novel deliberation-based approach to end-to-end (E2E) spoken language
understanding (SLU), where a streaming automatic speech recognition (ASR) model …

Improving end-to-end speech processing by efficient text data utilization with latent synthesis

J Lu, W Huang, N Zheng, X Zeng, YT Yeung… - arXiv preprint arXiv …, 2023 - arxiv.org
Training a high performance end-to-end speech (E2E) processing model requires an
enormous amount of labeled speech data, especially in the era of data-centric artificial …

Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding

S Kim, A Shrivastava, D Le, J Lin, O Kalinli… - arXiv preprint arXiv …, 2023 - arxiv.org
End-to-end (E2E) spoken language understanding (SLU) systems that generate a semantic
parse from speech have become more promising recently. This approach uses a single …

Integration of Frame-and Label-synchronous Beam Search for Streaming Encoder-decoder Speech Recognition

E Tsunoo, H Futami, Y Kashiwagi, S Arora… - arXiv preprint arXiv …, 2023 - arxiv.org
Although frame-based models, such as CTC and transducers, have an affinity for streaming
automatic speech recognition, their decoding uses no future knowledge, which could lead to …