On the use of external data for spoken named entity recognition

S Shon, S Arora, CJ Lin, A Pasad, F Wu… - arXiv preprint arXiv …, 2022 - arxiv.org

Spoken language understanding (SLU) tasks have been studied for many decades in the
speech research community, but have not received as much attention as lower-level tasks …

被引用次数：24 相关文章所有 7 个版本

[PDF] arxiv.org

Wav2seq: Pre-training speech-to-text encoder-decoder models using pseudo languages

F Wu, K Kim, S Watanabe, KJ Han… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

We introduce Wav2Seq, the first self-supervised approach to pre-train both parts of encoder-
decoder models for speech data. We induce a pseudo language as a compact discrete …

被引用次数：28 相关文章所有 5 个版本

[PDF] arxiv.org

A brief overview of unsupervised neural speech representation learning

L Borgholt, JD Havtorn, J Edin, L Maaløe… - arXiv preprint arXiv …, 2022 - arxiv.org

Unsupervised representation learning for speech processing has matured greatly in the last
few years. Work in computer vision and natural language processing has paved the way, but …

被引用次数：10 相关文章所有 5 个版本

[PDF] mit.edu

What do self-supervised speech models know about words?

A Pasad, CM Chien, S Settle, K Livescu - Transactions of the …, 2024 - direct.mit.edu

Many self-supervised speech models (S3Ms) have been introduced over the last few years,
improving performance and data efficiency on various speech tasks. However, these …

被引用次数：4 相关文章所有 4 个版本

[PDF] arxiv.org

What do self-supervised speech models know about words?

A Pasad, CM Chien, S Settle, K Livescu - arXiv preprint arXiv:2307.00162, 2023 - arxiv.org

Many self-supervised speech models (S3Ms) have been introduced over the last few years,
producing performance and data efficiency improvements for a variety of speech tasks …

被引用次数：9 相关文章所有 3 个版本

[PDF] arxiv.org

A study on the integration of pre-trained ssl, asr, lm and slu models for spoken language understanding

Y Peng, S Arora, Y Higuchi, Y Ueda… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org

Collecting sufficient labeled data for spoken language understanding (SLU) is expensive
and time-consuming. Recent studies achieved promising results by using pre-trained …

被引用次数：18 相关文章所有 5 个版本

[PDF] arxiv.org

Zero-shot end-to-end spoken language understanding via cross-modal selective self-training

J He, J Salazar, K Yao, H Li, J Cai - arXiv preprint arXiv:2305.12793, 2023 - arxiv.org

End-to-end (E2E) spoken language understanding (SLU) is constrained by the cost of
collecting speech-semantics pairs, especially when label domains change. Hence, we …

被引用次数：7 相关文章所有 3 个版本

[PDF] arxiv.org

Integrating pretrained asr and lm to perform sequence generation for spoken language understanding

S Arora, H Futami, Y Kashiwagi, E Tsunoo… - arXiv preprint arXiv …, 2023 - arxiv.org

There has been an increased interest in the integration of pretrained speech recognition
(ASR) and language models (LM) into the SLU framework. However, prior methods often …

被引用次数：4 相关文章所有 5 个版本

[PDF] arxiv.org

On the Evaluation of Speech Foundation Models for Spoken Language Understanding

S Arora, A Pasad, CM Chien, J Han, R Sharma… - arXiv preprint arXiv …, 2024 - arxiv.org

The Spoken Language Understanding Evaluation (SLUE) suite of benchmark tasks was
recently introduced to address the need for open resources and benchmarking of complex …

被引用次数：1 相关文章

[PDF] arxiv.org

End-to-end model for named entity recognition from speech without paired training data

S Mdhaffar, J Duret, T Parcollet, Y Estève - arXiv preprint arXiv …, 2022 - arxiv.org

Recent works showed that end-to-end neural approaches tend to become very popular for
spoken language understanding (SLU). Through the term end-to-end, one considers the use …

被引用次数：12 相关文章所有 7 个版本