Speechprompt: An exploration of prompt tuning on generative spoken language model for speech...

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

被引用次数：100 相关文章所有 6 个版本

[PDF] dtu.dk

Self-supervised speech representation learning: A review

A Mohamed, H Lee, L Borgholt… - IEEE Journal of …, 2022 - ieeexplore.ieee.org

Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …

被引用次数：283 相关文章所有 10 个版本

[PDF] arxiv.org

Listen, think, and understand

Y Gong, H Luo, AH Liu, L Karlinsky, J Glass - arXiv preprint arXiv …, 2023 - arxiv.org

The ability of artificial intelligence (AI) systems to perceive and comprehend audio signals is
crucial for many applications. Although significant progress has been made in this area …

被引用次数：69 相关文章所有 6 个版本

[PDF] arxiv.org

Prompting the hidden talent of web-scale speech models for zero-shot task generalization

P Peng, B Yan, S Watanabe, D Harwath - arXiv preprint arXiv:2305.11095, 2023 - arxiv.org

We investigate the emergent abilities of the recently proposed web-scale speech model
Whisper, by adapting it to unseen tasks with prompt engineering. We selected three tasks …

被引用次数：31 相关文章所有 7 个版本

[PDF] arxiv.org

Can chatgpt detect intent? evaluating large language models for spoken language understanding

M He, PN Garner - arXiv preprint arXiv:2305.13512, 2023 - arxiv.org

Recently, large pretrained language models have demonstrated strong language
understanding capabilities. This is particularly reflected in their zero-shot and in-context …

被引用次数：26 相关文章所有 5 个版本

[PDF] arxiv.org

Speechprompt v2: Prompt tuning for speech classification tasks

KW Chang, YK Wang, H Shen, I Kang… - arXiv preprint arXiv …, 2023 - arxiv.org

Prompt tuning is a technology that tunes a small set of parameters to steer a pre-trained
language model (LM) to directly generate the output for downstream tasks. Recently, prompt …

被引用次数：24 相关文章所有 2 个版本

[PDF] arxiv.org

Exploring efficient-tuning methods in self-supervised speech models

ZC Chen, CL Fu, CY Liu, SWD Li… - 2022 IEEE spoken …, 2023 - ieeexplore.ieee.org

In this study, we aim to explore efficient tuning methods for speech self-supervised learning.
Recent studies show that self-supervised learning (SSL) can learn powerful representations …

被引用次数：29 相关文章所有 4 个版本

[PDF] arxiv.org

Speechgen: Unlocking the generative power of speech language models with prompts

H Wu, KW Chang, YK Wu, H Lee - arXiv preprint arXiv:2306.02207, 2023 - arxiv.org

Large language models (LLMs) have gained considerable attention for Artificial Intelligence
Generated Content (AIGC), particularly with the emergence of ChatGPT. However, the direct …

被引用次数：16 相关文章所有 2 个版本

[PDF] arxiv.org

From english to more languages: Parameter-efficient model reprogramming for cross-lingual speech recognition

CHH Yang, B Li, Y Zhang, N Chen… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

In this work, we propose a new parameter-efficient learning framework based on neural
model reprogramming for cross-lingual speech recognition, which can re-purpose well …

被引用次数：22 相关文章所有 2 个版本

[PDF] arxiv.org

Toward universal speech enhancement for diverse input conditions

W Zhang, K Saijo, ZQ Wang… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

The past decade has witnessed substantial growth of data-driven speech enhancement (SE)
techniques thanks to deep learning. While existing approaches have shown impressive …

被引用次数：8 相关文章所有 5 个版本