[HTML][HTML] Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

P Ochieng - Artificial Intelligence Review, 2023 - Springer
Deep neural networks (DNN) techniques have become pervasive in domains such as
natural language processing and computer vision. They have achieved great success in …

Self-supervised learning for speech enhancement through synthesis

B Irvin, M Stamenovic, M Kegler… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Modern speech enhancement (SE) networks typically implement noise suppression through
time-frequency masking, latent representation masking, or discriminative signal prediction …

Improving meeting inclusiveness using speech interruption analysis

SW Fu, Y Fan, Y Hosseinkashi, J Gupchup… - Proceedings of the 30th …, 2022 - dl.acm.org
Meetings are a pervasive method of communication within all types of companies and
organizations, and using remote collaboration systems to conduct meetings has increased …

Speech emotion diarization: Which emotion appears when?

Y Wang, M Ravanelli, A Yacoubi - 2023 IEEE Automatic Speech …, 2023 - ieeexplore.ieee.org
Speech Emotion Recognition (SER) typically relies on utterance-level solutions. However,
emotions conveyed through speech should be considered as discrete speech events with …

Audio-visual speech enhancement using self-supervised learning to improve speech intelligibility in cochlear implant simulations

RL Lai, JC Hou, M Gogate, K Dashtipour… - arXiv preprint arXiv …, 2023 - arxiv.org
Individuals with hearing impairments face challenges in their ability to comprehend speech,
particularly in noisy environments. The aim of this study is to explore the effectiveness of …

Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters

K Fujita, H Sato, T Ashihara… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
The zero-shot text-to-speech (TTS) method, based on speaker embeddings extracted from
reference speech using self-supervised learning (SSL) speech representations, can …

Target Speech Extraction with Pre-Trained Self-Supervised Learning Models

J Peng, M Delcroix, T Ochiai, O Plchot… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Pre-trained self-supervised learning (SSL) models have achieved remarkable success in
various speech tasks. However, their potential in target speech extraction (TSE) has not …

Audio-visual speech enhancement and separation by utilizing multi-modal self-supervised embeddings

IC Chern, KH Hung, YT Chen, T Hussain… - … , Speech, and Signal …, 2023 - ieeexplore.ieee.org
AV-HuBERT, a multi-modal self-supervised learning model, has been shown to be effective
for categorical problems such as automatic speech recognition and lip-reading. This …

Extending audio masked autoencoders toward audio restoration

Z Zhong, H Shi, M Hirano, K Shimada… - … IEEE Workshop on …, 2023 - ieeexplore.ieee.org
Audio classification and restoration are among major downstream tasks in audio signal
processing. However, restoration derives less of a benefit from pretrained models compared …

An adapter based multi-label pre-training for speech separation and enhancement

T Wang, X Chen, Z Chen, S Yu… - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
In recent years, self-supervised learning (SSL) has achieved tremendous success in various
speech tasks due to its power to extract representations from massive unlabeled data …