A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Music source separation with band-split RNN

Y Luo, J Yu - IEEE/ACM Transactions on Audio, Speech, and …, 2023 - ieeexplore.ieee.org
The performance of music source separation (MSS) models has been greatly improved in
recent years thanks to the development of novel neural network architectures and training …

The CHiME-7 UDASE task: Unsupervised domain adaptation for conversational speech enhancement

S Leglaive, L Borne, E Tzinis, M Sadeghi… - arXiv preprint arXiv …, 2023 - arxiv.org
Supervised speech enhancement models are trained using artificially generated mixtures of
clean speech and noise signals, which may not match real-world recording conditions at test …

Self-remixing: Unsupervised speech separation via separation and remixing

K Saijo, T Ogawa - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org
We present Self-Remixing, a novel self-supervised speech separation method, which refines
a pre-trained separation model in an unsupervised manner. Self-Remixing consists of a …

Objective and subjective evaluation of speech enhancement methods in the UDASE task of the 7th CHiME challenge

S Leglaive, M Fraticelli, H ElGhazaly, L Borne… - arXiv preprint arXiv …, 2024 - arxiv.org
Supervised models for speech enhancement are trained using artificially generated mixtures
of clean speech and noise signals. However, the synthetic training conditions may not …

Semi-supervised time domain target speaker extraction with attention

Z Wang, R Giri, S Venkataramani, U Isik… - arXiv preprint arXiv …, 2022 - arxiv.org
In this work, we propose Exformer, a time-domain architecture for target speaker extraction. It
consists of a pre-trained speaker embedder network and a separator network based on …

A systematic comparison of phonetic aware techniques for speech enhancement

O Tal, M Mandel, F Kreuk, Y Adi - arXiv preprint arXiv:2206.11000, 2022 - arxiv.org
Speech enhancement has seen great improvement in recent years using end-to-end neural
networks. However, most models are agnostic to the spoken phonetic content. Recently …

On monoaural speech enhancement for automatic recognition of real noisy speech using mixture invariant training

J Zhang, C Zorila, R Doddipatla, J Barker - arXiv preprint arXiv …, 2022 - arxiv.org
In this paper, we explore an improved framework to train a monoaural neural enhancement
model for robust speech recognition. The designed training framework extends the existing …

Mixcycle: unsupervised speech separation via cyclic mixture permutation invariant training

E Karamatlı, S Kırbız - IEEE Signal Processing Letters, 2022 - ieeexplore.ieee.org
We introduce two unsupervised source separation methods, which involve self-supervised
training from single-channel two-source speech mixtures. Our first method, mixture …

[PDF][PDF] Remixing-based Unsupervised Source Separation from Scratch

K Saijo, T Ogawa - arXiv preprint arXiv:2309.00376, 2023 - isca-archive.org
We propose an unsupervised approach for training separation models from scratch using
RemixIT and Self-Remixing, which are recently proposed self-supervised learning methods …