A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Self-supervised speech representation learning: A review

A Mohamed, H Lee, L Borgholt… - IEEE Journal of …, 2022 - ieeexplore.ieee.org
Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …

Ssast: Self-supervised audio spectrogram transformer

Y Gong, CI Lai, YA Chung, J Glass - … of the AAAI Conference on Artificial …, 2022 - ojs.aaai.org
Recently, neural networks based purely on self-attention, such as the Vision Transformer
(ViT), have been shown to outperform deep learning models constructed with convolutional …

Parp: Prune, adjust and re-prune for self-supervised speech recognition

CIJ Lai, Y Zhang, AH Liu, S Chang… - Advances in …, 2021 - proceedings.neurips.cc
Self-supervised speech representation learning (speech SSL) has demonstrated the benefit
of scale in learning rich representations for Automatic Speech Recognition (ASR) with …

A survey of reasoning with foundation models

J Sun, C Zheng, E Xie, Z Liu, R Chu, J Qiu, J Xu… - arXiv preprint arXiv …, 2023 - arxiv.org
Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-
world settings such as negotiation, medical diagnosis, and criminal investigation. It serves …

Very short-term residential load forecasting based on deep-autoformer

Y Jiang, T Gao, Y Dai, R Si, J Hao, J Zhang, DW Gao - Applied Energy, 2022 - Elsevier
Very short-term load forecasting (VSLTF) plays an essential role in guaranteeing effective
electricity dispatching and generating in residential microgrid systems. However, the …

Dsmt-net: Dual self-supervised multi-operator transformation for multi-source endoscopic ultrasound diagnosis

J Li, P Zhang, T Wang, L Zhu, R Liu… - … on Medical Imaging, 2023 - ieeexplore.ieee.org
Pancreatic cancer has the worst prognosis of all cancers. The clinical application of
endoscopic ultrasound (EUS) for the assessment of pancreatic cancer risk and of deep …

Injecting text in self-supervised speech pretraining

Z Chen, Y Zhang, A Rosenberg… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
Self-supervised pretraining for Automated Speech Recognition (ASR) has shown varied
degrees of success. In this paper, we propose to jointly learn representations during …

Autoregressive predictive coding: A comprehensive study

GP Yang, SL Yeh, YA Chung, J Glass… - IEEE Journal of …, 2022 - ieeexplore.ieee.org
We review autoregressive predictive coding (APC), an approach to learn speech
representation by predicting a future frame given the past frames. We present three different …

Exploring self-supervised representation ensembles for covid-19 cough classification

H Xue, FD Salim - Proceedings of the 27th ACM SIGKDD Conference on …, 2021 - dl.acm.org
The usage of smartphone-collected respiratory sound, trained with deep learning models,
for detecting and classifying COVID-19 becomes popular recently. It removes the need for in …