Boosting self-supervised embeddings for speech enhancement

[HTML][HTML] Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

P Ochieng - Artificial Intelligence Review, 2023 - Springer

Deep neural networks (DNN) techniques have become pervasive in domains such as
natural language processing and computer vision. They have achieved great success in …

被引用次数：14 相关文章所有 8 个版本

[PDF] arxiv.org

Self-supervised learning for speech enhancement through synthesis

B Irvin, M Stamenovic, M Kegler… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Modern speech enhancement (SE) networks typically implement noise suppression through
time-frequency masking, latent representation masking, or discriminative signal prediction …

被引用次数：13 相关文章所有 4 个版本

[PDF] arxiv.org

Improving meeting inclusiveness using speech interruption analysis

SW Fu, Y Fan, Y Hosseinkashi, J Gupchup… - Proceedings of the 30th …, 2022 - dl.acm.org

Meetings are a pervasive method of communication within all types of companies and
organizations, and using remote collaboration systems to conduct meetings has increased …

被引用次数：10 相关文章所有 4 个版本

[PDF] arxiv.org

Speech emotion diarization: Which emotion appears when?

Y Wang, M Ravanelli, A Yacoubi - 2023 IEEE Automatic Speech …, 2023 - ieeexplore.ieee.org

Speech Emotion Recognition (SER) typically relies on utterance-level solutions. However,
emotions conveyed through speech should be considered as discrete speech events with …

被引用次数：7 相关文章所有 4 个版本

[PDF] arxiv.org

Audio-visual speech enhancement using self-supervised learning to improve speech intelligibility in cochlear implant simulations

RL Lai, JC Hou, M Gogate, K Dashtipour… - arXiv preprint arXiv …, 2023 - arxiv.org

Individuals with hearing impairments face challenges in their ability to comprehend speech,
particularly in noisy environments. The aim of this study is to explore the effectiveness of …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters

K Fujita, H Sato, T Ashihara… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

The zero-shot text-to-speech (TTS) method, based on speaker embeddings extracted from
reference speech using self-supervised learning (SSL) speech representations, can …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

Target Speech Extraction with Pre-Trained Self-Supervised Learning Models

J Peng, M Delcroix, T Ochiai, O Plchot… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

Pre-trained self-supervised learning (SSL) models have achieved remarkable success in
various speech tasks. However, their potential in target speech extraction (TSE) has not …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Audio-visual speech enhancement and separation by utilizing multi-modal self-supervised embeddings

IC Chern, KH Hung, YT Chen, T Hussain… - … , Speech, and Signal …, 2023 - ieeexplore.ieee.org

AV-HuBERT, a multi-modal self-supervised learning model, has been shown to be effective
for categorical problems such as automatic speech recognition and lip-reading. This …

被引用次数：8 相关文章所有 4 个版本

[PDF] arxiv.org

Extending audio masked autoencoders toward audio restoration

Z Zhong, H Shi, M Hirano, K Shimada… - … IEEE Workshop on …, 2023 - ieeexplore.ieee.org

Audio classification and restoration are among major downstream tasks in audio signal
processing. However, restoration derives less of a benefit from pretrained models compared …

被引用次数：4 相关文章所有 4 个版本

[PDF] arxiv.org

An adapter based multi-label pre-training for speech separation and enhancement

T Wang, X Chen, Z Chen, S Yu… - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org

In recent years, self-supervised learning (SSL) has achieved tremendous success in various
speech tasks due to its power to extract representations from massive unlabeled data …

被引用次数：5 相关文章所有 3 个版本