A review of deep learning techniques for speech processing
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …
learning. The use of multiple processing layers has enabled the creation of models capable …
Self-supervised speech representation learning: A review
Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …
necessitated the building of specialist models for individual tasks and application scenarios …
Ssast: Self-supervised audio spectrogram transformer
Recently, neural networks based purely on self-attention, such as the Vision Transformer
(ViT), have been shown to outperform deep learning models constructed with convolutional …
(ViT), have been shown to outperform deep learning models constructed with convolutional …
Parp: Prune, adjust and re-prune for self-supervised speech recognition
Self-supervised speech representation learning (speech SSL) has demonstrated the benefit
of scale in learning rich representations for Automatic Speech Recognition (ASR) with …
of scale in learning rich representations for Automatic Speech Recognition (ASR) with …
A survey of reasoning with foundation models
Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-
world settings such as negotiation, medical diagnosis, and criminal investigation. It serves …
world settings such as negotiation, medical diagnosis, and criminal investigation. It serves …
Very short-term residential load forecasting based on deep-autoformer
Very short-term load forecasting (VSLTF) plays an essential role in guaranteeing effective
electricity dispatching and generating in residential microgrid systems. However, the …
electricity dispatching and generating in residential microgrid systems. However, the …
Dsmt-net: Dual self-supervised multi-operator transformation for multi-source endoscopic ultrasound diagnosis
Pancreatic cancer has the worst prognosis of all cancers. The clinical application of
endoscopic ultrasound (EUS) for the assessment of pancreatic cancer risk and of deep …
endoscopic ultrasound (EUS) for the assessment of pancreatic cancer risk and of deep …
Injecting text in self-supervised speech pretraining
Self-supervised pretraining for Automated Speech Recognition (ASR) has shown varied
degrees of success. In this paper, we propose to jointly learn representations during …
degrees of success. In this paper, we propose to jointly learn representations during …
Autoregressive predictive coding: A comprehensive study
We review autoregressive predictive coding (APC), an approach to learn speech
representation by predicting a future frame given the past frames. We present three different …
representation by predicting a future frame given the past frames. We present three different …
Exploring self-supervised representation ensembles for covid-19 cough classification
The usage of smartphone-collected respiratory sound, trained with deep learning models,
for detecting and classifying COVID-19 becomes popular recently. It removes the need for in …
for detecting and classifying COVID-19 becomes popular recently. It removes the need for in …