Metricgan+: An improved version of metricgan for speech enhancement

SW Fu, C Yu, TA Hsieh, P Plantinga… - arXiv preprint arXiv …, 2021 - arxiv.org
The discrepancy between the cost function used for training a speech enhancement model
and human auditory perception usually makes the quality of enhanced speech …

Boosting self-supervised embeddings for speech enhancement

KH Hung, S Fu, HH Tseng, HT Chiang, Y Tsao… - arXiv preprint arXiv …, 2022 - arxiv.org
Self-supervised learning (SSL) representation for speech has achieved state-of-the-art
(SOTA) performance on several downstream tasks. However, there remains room for …

Perceptual contrast stretching on target feature for speech enhancement

R Chao, C Yu, SW Fu, X Lu, Y Tsao - arXiv preprint arXiv:2203.17152, 2022 - arxiv.org
Speech enhancement (SE) performance has improved considerably owing to the use of
deep learning models as a base function. Herein, we propose a perceptual contrast …

Audio-visual speech enhancement using self-supervised learning to improve speech intelligibility in cochlear implant simulations

RL Lai, JC Hou, M Gogate, K Dashtipour… - arXiv preprint arXiv …, 2023 - arxiv.org
Individuals with hearing impairments face challenges in their ability to comprehend speech,
particularly in noisy environments. The aim of this study is to explore the effectiveness of …

An empirical study on the impact of positional encoding in transformer-based monaural speech enhancement

Q Zhang, M Ge, H Zhu, E Ambikairajah… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Transformer architecture has enabled recent progress in speech enhancement. Since
Transformers are position-agostic, positional encoding is the de facto standard component …

Transformers with competitive ensembles of independent mechanisms

A Lamb, D He, A Goyal, G Ke, CF Liao… - arXiv preprint arXiv …, 2021 - arxiv.org
An important development in deep learning from the earliest MLPs has been a move
towards architectures with structural inductive biases which enable the model to keep …

An Investigation of Incorporating Mamba for Speech Enhancement

R Chao, WH Cheng, M La Quatra… - arXiv preprint arXiv …, 2024 - arxiv.org
This work aims to study a scalable state-space model (SSM), Mamba, for the speech
enhancement (SE) task. We exploit a Mamba-based regression model to characterize …

Vset: A multimodal transformer for visual speech enhancement

K Ramesh, C Xing, W Wang, D Wang… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
The transformer architecture has shown great capability in learning long-term dependency
and works well in multiple domains. However, transformer has been less considered in …

Improving character error rate is not equal to having clean speech: Speech enhancement for asr systems with black-box acoustic models

R Sawata, Y Kashiwagi… - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
A deep neural network (DNN)-based speech enhancement (SE) aiming to maximize the
performance of an automatic speech recognition (ASR) system is proposed in this paper. In …

OSSEM: one-shot speaker adaptive speech enhancement using meta learning

C Yu, SW Fu, TA Hsieh, Y Tsao, M Ravanelli - arXiv preprint arXiv …, 2021 - arxiv.org
Although deep learning (DL) has achieved notable progress in speech enhancement (SE),
further research is still required for a DL-based SE system to adapt effectively and efficiently …