A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Self-supervised speech representation learning: A review

A Mohamed, H Lee, L Borgholt… - IEEE Journal of …, 2022 - ieeexplore.ieee.org
Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …

Listen, think, and understand

Y Gong, H Luo, AH Liu, L Karlinsky, J Glass - arXiv preprint arXiv …, 2023 - arxiv.org
The ability of artificial intelligence (AI) systems to perceive and comprehend audio signals is
crucial for many applications. Although significant progress has been made in this area …

Prompting the hidden talent of web-scale speech models for zero-shot task generalization

P Peng, B Yan, S Watanabe, D Harwath - arXiv preprint arXiv:2305.11095, 2023 - arxiv.org
We investigate the emergent abilities of the recently proposed web-scale speech model
Whisper, by adapting it to unseen tasks with prompt engineering. We selected three tasks …

Can chatgpt detect intent? evaluating large language models for spoken language understanding

M He, PN Garner - arXiv preprint arXiv:2305.13512, 2023 - arxiv.org
Recently, large pretrained language models have demonstrated strong language
understanding capabilities. This is particularly reflected in their zero-shot and in-context …

Speechprompt v2: Prompt tuning for speech classification tasks

KW Chang, YK Wang, H Shen, I Kang… - arXiv preprint arXiv …, 2023 - arxiv.org
Prompt tuning is a technology that tunes a small set of parameters to steer a pre-trained
language model (LM) to directly generate the output for downstream tasks. Recently, prompt …

Exploring efficient-tuning methods in self-supervised speech models

ZC Chen, CL Fu, CY Liu, SWD Li… - 2022 IEEE spoken …, 2023 - ieeexplore.ieee.org
In this study, we aim to explore efficient tuning methods for speech self-supervised learning.
Recent studies show that self-supervised learning (SSL) can learn powerful representations …

Speechgen: Unlocking the generative power of speech language models with prompts

H Wu, KW Chang, YK Wu, H Lee - arXiv preprint arXiv:2306.02207, 2023 - arxiv.org
Large language models (LLMs) have gained considerable attention for Artificial Intelligence
Generated Content (AIGC), particularly with the emergence of ChatGPT. However, the direct …

From english to more languages: Parameter-efficient model reprogramming for cross-lingual speech recognition

CHH Yang, B Li, Y Zhang, N Chen… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
In this work, we propose a new parameter-efficient learning framework based on neural
model reprogramming for cross-lingual speech recognition, which can re-purpose well …

Toward universal speech enhancement for diverse input conditions

W Zhang, K Saijo, ZQ Wang… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
The past decade has witnessed substantial growth of data-driven speech enhancement (SE)
techniques thanks to deep learning. While existing approaches have shown impressive …