SLUE phase-2: A benchmark suite of diverse spoken language understanding tasks

S Shon, S Arora, CJ Lin, A Pasad, F Wu… - arXiv preprint arXiv …, 2022 - arxiv.org
Spoken language understanding (SLU) tasks have been studied for many decades in the
speech research community, but have not received as much attention as lower-level tasks …

Wav2seq: Pre-training speech-to-text encoder-decoder models using pseudo languages

F Wu, K Kim, S Watanabe, KJ Han… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
We introduce Wav2Seq, the first self-supervised approach to pre-train both parts of encoder-
decoder models for speech data. We induce a pseudo language as a compact discrete …

A brief overview of unsupervised neural speech representation learning

L Borgholt, JD Havtorn, J Edin, L Maaløe… - arXiv preprint arXiv …, 2022 - arxiv.org
Unsupervised representation learning for speech processing has matured greatly in the last
few years. Work in computer vision and natural language processing has paved the way, but …

What do self-supervised speech models know about words?

A Pasad, CM Chien, S Settle, K Livescu - Transactions of the …, 2024 - direct.mit.edu
Many self-supervised speech models (S3Ms) have been introduced over the last few years,
improving performance and data efficiency on various speech tasks. However, these …

What do self-supervised speech models know about words?

A Pasad, CM Chien, S Settle, K Livescu - arXiv preprint arXiv:2307.00162, 2023 - arxiv.org
Many self-supervised speech models (S3Ms) have been introduced over the last few years,
producing performance and data efficiency improvements for a variety of speech tasks …

A study on the integration of pre-trained ssl, asr, lm and slu models for spoken language understanding

Y Peng, S Arora, Y Higuchi, Y Ueda… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
Collecting sufficient labeled data for spoken language understanding (SLU) is expensive
and time-consuming. Recent studies achieved promising results by using pre-trained …

Zero-shot end-to-end spoken language understanding via cross-modal selective self-training

J He, J Salazar, K Yao, H Li, J Cai - arXiv preprint arXiv:2305.12793, 2023 - arxiv.org
End-to-end (E2E) spoken language understanding (SLU) is constrained by the cost of
collecting speech-semantics pairs, especially when label domains change. Hence, we …

Integrating pretrained asr and lm to perform sequence generation for spoken language understanding

S Arora, H Futami, Y Kashiwagi, E Tsunoo… - arXiv preprint arXiv …, 2023 - arxiv.org
There has been an increased interest in the integration of pretrained speech recognition
(ASR) and language models (LM) into the SLU framework. However, prior methods often …

On the Evaluation of Speech Foundation Models for Spoken Language Understanding

S Arora, A Pasad, CM Chien, J Han, R Sharma… - arXiv preprint arXiv …, 2024 - arxiv.org
The Spoken Language Understanding Evaluation (SLUE) suite of benchmark tasks was
recently introduced to address the need for open resources and benchmarking of complex …

End-to-end model for named entity recognition from speech without paired training data

S Mdhaffar, J Duret, T Parcollet, Y Estève - arXiv preprint arXiv …, 2022 - arxiv.org
Recent works showed that end-to-end neural approaches tend to become very popular for
spoken language understanding (SLU). Through the term end-to-end, one considers the use …