SLUE phase-2: A benchmark suite of diverse spoken language understanding tasks
Spoken language understanding (SLU) tasks have been studied for many decades in the
speech research community, but have not received as much attention as lower-level tasks …
speech research community, but have not received as much attention as lower-level tasks …
Wav2seq: Pre-training speech-to-text encoder-decoder models using pseudo languages
We introduce Wav2Seq, the first self-supervised approach to pre-train both parts of encoder-
decoder models for speech data. We induce a pseudo language as a compact discrete …
decoder models for speech data. We induce a pseudo language as a compact discrete …
A brief overview of unsupervised neural speech representation learning
Unsupervised representation learning for speech processing has matured greatly in the last
few years. Work in computer vision and natural language processing has paved the way, but …
few years. Work in computer vision and natural language processing has paved the way, but …
What do self-supervised speech models know about words?
Many self-supervised speech models (S3Ms) have been introduced over the last few years,
improving performance and data efficiency on various speech tasks. However, these …
improving performance and data efficiency on various speech tasks. However, these …
What do self-supervised speech models know about words?
Many self-supervised speech models (S3Ms) have been introduced over the last few years,
producing performance and data efficiency improvements for a variety of speech tasks …
producing performance and data efficiency improvements for a variety of speech tasks …
A study on the integration of pre-trained ssl, asr, lm and slu models for spoken language understanding
Collecting sufficient labeled data for spoken language understanding (SLU) is expensive
and time-consuming. Recent studies achieved promising results by using pre-trained …
and time-consuming. Recent studies achieved promising results by using pre-trained …
Zero-shot end-to-end spoken language understanding via cross-modal selective self-training
End-to-end (E2E) spoken language understanding (SLU) is constrained by the cost of
collecting speech-semantics pairs, especially when label domains change. Hence, we …
collecting speech-semantics pairs, especially when label domains change. Hence, we …
Integrating pretrained asr and lm to perform sequence generation for spoken language understanding
There has been an increased interest in the integration of pretrained speech recognition
(ASR) and language models (LM) into the SLU framework. However, prior methods often …
(ASR) and language models (LM) into the SLU framework. However, prior methods often …
On the Evaluation of Speech Foundation Models for Spoken Language Understanding
The Spoken Language Understanding Evaluation (SLUE) suite of benchmark tasks was
recently introduced to address the need for open resources and benchmarking of complex …
recently introduced to address the need for open resources and benchmarking of complex …
End-to-end model for named entity recognition from speech without paired training data
Recent works showed that end-to-end neural approaches tend to become very popular for
spoken language understanding (SLU). Through the term end-to-end, one considers the use …
spoken language understanding (SLU). Through the term end-to-end, one considers the use …