Speechprompt v2: Prompt tuning for speech classification tasks

Y Gong, H Luo, AH Liu, L Karlinsky, J Glass - arXiv preprint arXiv …, 2023 - arxiv.org

The ability of artificial intelligence (AI) systems to perceive and comprehend audio signals is
crucial for many applications. Although significant progress has been made in this area …

被引用次数：74 相关文章所有 6 个版本

[PDF] neurips.cc

Hyporadise: An open baseline for generative speech recognition with large language models

C Chen, Y Hu, CHH Yang… - Advances in …, 2024 - proceedings.neurips.cc

Advancements in deep neural networks have allowed automatic speech recognition (ASR)
systems to attain human parity on several publicly available clean speech datasets …

被引用次数：21 相关文章所有 9 个版本

[PDF] arxiv.org

Towards audio language modeling-an overview

H Wu, X Chen, YC Lin, K Chang, HL Chung… - arXiv preprint arXiv …, 2024 - arxiv.org

Neural audio codecs are initially introduced to compress audio data into compact codes to
reduce transmission latency. Researchers recently discovered the potential of codecs as …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

Can chatgpt detect intent? evaluating large language models for spoken language understanding

M He, PN Garner - arXiv preprint arXiv:2305.13512, 2023 - arxiv.org

Recently, large pretrained language models have demonstrated strong language
understanding capabilities. This is particularly reflected in their zero-shot and in-context …

被引用次数：28 相关文章所有 5 个版本

[PDF] arxiv.org

Whispering LLaMA: A cross-modal generative error correction framework for speech recognition

S Radhakrishnan, CHH Yang, SA Khan… - arXiv preprint arXiv …, 2023 - arxiv.org

We introduce a new cross-modal fusion technique designed for generative error correction
in automatic speech recognition (ASR). Our methodology leverages both acoustic …

被引用次数：22 相关文章所有 6 个版本

[PDF] arxiv.org

Speechgen: Unlocking the generative power of speech language models with prompts

H Wu, KW Chang, YK Wu, H Lee - arXiv preprint arXiv:2306.02207, 2023 - arxiv.org

Large language models (LLMs) have gained considerable attention for Artificial Intelligence
Generated Content (AIGC), particularly with the emergence of ChatGPT. However, the direct …

被引用次数：16 相关文章所有 2 个版本

[PDF] arxiv.org

Joint audio and speech understanding

Y Gong, AH Liu, H Luo, L Karlinsky… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

Humans are surrounded by audio signals that include both speech and non-speech sounds.
The recognition and understanding of speech and non-speech audio events, along with a …

被引用次数：22 相关文章所有 7 个版本

[PDF] aclanthology.org

Integrating pre-trained speech and language models for end-to-end speech recognition

Y Hono, K Mitsuda, T Zhao, K Mitsui… - Findings of the …, 2024 - aclanthology.org

Advances in machine learning have made it possible to perform various text and speech
processing tasks, such as automatic speech recognition (ASR), in an end-to-end (E2E) …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

Dynamic-superb: Towards a dynamic, collaborative, and comprehensive instruction-tuning benchmark for speech

C Huang, KH Lu, SH Wang, CY Hsiao… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

Text language models have shown remarkable zero-shot capability in generalizing to
unseen tasks when provided with well-formulated instructions. However, existing studies in …

被引用次数：12 相关文章所有 3 个版本

[PDF] arxiv.org

Universlu: Universal spoken language understanding for diverse classification and sequence generation tasks with a single network

S Arora, H Futami, J Jung, Y Peng, R Sharma… - arXiv preprint arXiv …, 2023 - arxiv.org

Recent studies have demonstrated promising outcomes by employing large language
models with multi-tasking capabilities. They utilize prompts to guide the model's behavior …

被引用次数：5 相关文章所有 2 个版本