Stop: A dataset for spoken task oriented semantic parsing

RS Srinivasa, J Cho, C Yang… - Advances in …, 2023 - proceedings.neurips.cc

This paper considers contrastive training for cross-modal 0-shot transfer wherein a pre-
trained model in one modality is used for representation learning in another domain using …

被引用次数：5 相关文章所有 4 个版本

[PDF] arxiv.org

SLUE phase-2: A benchmark suite of diverse spoken language understanding tasks

S Shon, S Arora, CJ Lin, A Pasad, F Wu… - arXiv preprint arXiv …, 2022 - arxiv.org

Spoken language understanding (SLU) tasks have been studied for many decades in the
speech research community, but have not received as much attention as lower-level tasks …

被引用次数：24 相关文章所有 7 个版本

[PDF] isca-archive.org

[PDF][PDF] Whislu: End-to-end spoken language understanding with whisper

M Wang, Y Li, J Guo, X Qiao, Z Li, H Shang… - Proc …, 2023 - isca-archive.org

Abstract Spoken Language Understanding (SLU) systems commonly use cascading
structures. However, these systems are prone to error propagation, information loss, high …

被引用次数：16 相关文章所有 2 个版本

[PDF] arxiv.org

Universlu: Universal spoken language understanding for diverse classification and sequence generation tasks with a single network

S Arora, H Futami, J Jung, Y Peng, R Sharma… - arXiv preprint arXiv …, 2023 - arxiv.org

Recent studies have demonstrated promising outcomes by employing large language
models with multi-tasking capabilities. They utilize prompts to guide the model's behavior …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

A comparative study on e-branchformer vs conformer in speech recognition, translation, and understanding tasks

Y Peng, K Kim, F Wu, B Yan, S Arora, W Chen… - arXiv preprint arXiv …, 2023 - arxiv.org

Conformer, a convolution-augmented Transformer variant, has become the de facto encoder
architecture for speech processing due to its superior performance in various tasks …

被引用次数：11 相关文章所有 8 个版本

[PDF] arxiv.org

Integrating pretrained asr and lm to perform sequence generation for spoken language understanding

S Arora, H Futami, Y Kashiwagi, E Tsunoo… - arXiv preprint arXiv …, 2023 - arxiv.org

There has been an increased interest in the integration of pretrained speech recognition
(ASR) and language models (LM) into the SLU framework. However, prior methods often …

被引用次数：4 相关文章所有 5 个版本

[PDF] arxiv.org

Deliberation model for on-device spoken language understanding

D Le, A Shrivastava, P Tomasello, S Kim… - arXiv preprint arXiv …, 2022 - arxiv.org

We propose a novel deliberation-based approach to end-to-end (E2E) spoken language
understanding (SLU), where a streaming automatic speech recognition (ASR) model …

被引用次数：12 相关文章所有 5 个版本

[PDF] arxiv.org

Improving end-to-end speech processing by efficient text data utilization with latent synthesis

J Lu, W Huang, N Zheng, X Zeng, YT Yeung… - arXiv preprint arXiv …, 2023 - arxiv.org

Training a high performance end-to-end speech (E2E) processing model requires an
enormous amount of labeled speech data, especially in the era of data-centric artificial …

被引用次数：2 相关文章所有 5 个版本

[PDF] arxiv.org

Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding

S Kim, A Shrivastava, D Le, J Lin, O Kalinli… - arXiv preprint arXiv …, 2023 - arxiv.org

End-to-end (E2E) spoken language understanding (SLU) systems that generate a semantic
parse from speech have become more promising recently. This approach uses a single …

被引用次数：2 相关文章所有 5 个版本

[PDF] arxiv.org

Integration of Frame-and Label-synchronous Beam Search for Streaming Encoder-decoder Speech Recognition

E Tsunoo, H Futami, Y Kashiwagi, S Arora… - arXiv preprint arXiv …, 2023 - arxiv.org

Although frame-based models, such as CTC and transducers, have an affinity for streaming
automatic speech recognition, their decoding uses no future knowledge, which could lead to …

被引用次数：3 相关文章所有 6 个版本