[HTML][HTML] Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis
P Ochieng - Artificial Intelligence Review, 2023 - Springer
Deep neural networks (DNN) techniques have become pervasive in domains such as
natural language processing and computer vision. They have achieved great success in …
natural language processing and computer vision. They have achieved great success in …
Self-supervised learning for speech enhancement through synthesis
Modern speech enhancement (SE) networks typically implement noise suppression through
time-frequency masking, latent representation masking, or discriminative signal prediction …
time-frequency masking, latent representation masking, or discriminative signal prediction …
Improving meeting inclusiveness using speech interruption analysis
Meetings are a pervasive method of communication within all types of companies and
organizations, and using remote collaboration systems to conduct meetings has increased …
organizations, and using remote collaboration systems to conduct meetings has increased …
Speech emotion diarization: Which emotion appears when?
Speech Emotion Recognition (SER) typically relies on utterance-level solutions. However,
emotions conveyed through speech should be considered as discrete speech events with …
emotions conveyed through speech should be considered as discrete speech events with …
Audio-visual speech enhancement using self-supervised learning to improve speech intelligibility in cochlear implant simulations
Individuals with hearing impairments face challenges in their ability to comprehend speech,
particularly in noisy environments. The aim of this study is to explore the effectiveness of …
particularly in noisy environments. The aim of this study is to explore the effectiveness of …
Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters
K Fujita, H Sato, T Ashihara… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
The zero-shot text-to-speech (TTS) method, based on speaker embeddings extracted from
reference speech using self-supervised learning (SSL) speech representations, can …
reference speech using self-supervised learning (SSL) speech representations, can …
Target Speech Extraction with Pre-Trained Self-Supervised Learning Models
Pre-trained self-supervised learning (SSL) models have achieved remarkable success in
various speech tasks. However, their potential in target speech extraction (TSE) has not …
various speech tasks. However, their potential in target speech extraction (TSE) has not …
Audio-visual speech enhancement and separation by utilizing multi-modal self-supervised embeddings
AV-HuBERT, a multi-modal self-supervised learning model, has been shown to be effective
for categorical problems such as automatic speech recognition and lip-reading. This …
for categorical problems such as automatic speech recognition and lip-reading. This …
Extending audio masked autoencoders toward audio restoration
Audio classification and restoration are among major downstream tasks in audio signal
processing. However, restoration derives less of a benefit from pretrained models compared …
processing. However, restoration derives less of a benefit from pretrained models compared …
An adapter based multi-label pre-training for speech separation and enhancement
In recent years, self-supervised learning (SSL) has achieved tremendous success in various
speech tasks due to its power to extract representations from massive unlabeled data …
speech tasks due to its power to extract representations from massive unlabeled data …