Continual self-training with bootstrapped remixing for speech enhancement

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

被引用次数：100 相关文章所有 6 个版本

[PDF] arxiv.org

Music source separation with band-split RNN

Y Luo, J Yu - IEEE/ACM Transactions on Audio, Speech, and …, 2023 - ieeexplore.ieee.org

The performance of music source separation (MSS) models has been greatly improved in
recent years thanks to the development of novel neural network architectures and training …

被引用次数：65 相关文章所有 4 个版本

[PDF] arxiv.org

The CHiME-7 UDASE task: Unsupervised domain adaptation for conversational speech enhancement

S Leglaive, L Borne, E Tzinis, M Sadeghi… - arXiv preprint arXiv …, 2023 - arxiv.org

Supervised speech enhancement models are trained using artificially generated mixtures of
clean speech and noise signals, which may not match real-world recording conditions at test …

被引用次数：12 相关文章所有 17 个版本

[PDF] arxiv.org

Self-remixing: Unsupervised speech separation via separation and remixing

K Saijo, T Ogawa - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org

We present Self-Remixing, a novel self-supervised speech separation method, which refines
a pre-trained separation model in an unsupervised manner. Self-Remixing consists of a …

被引用次数：7 相关文章所有 5 个版本

[PDF] arxiv.org

Objective and subjective evaluation of speech enhancement methods in the UDASE task of the 7th CHiME challenge

S Leglaive, M Fraticelli, H ElGhazaly, L Borne… - arXiv preprint arXiv …, 2024 - arxiv.org

Supervised models for speech enhancement are trained using artificially generated mixtures
of clean speech and noise signals. However, the synthetic training conditions may not …

被引用次数：1 相关文章所有 14 个版本

[PDF] arxiv.org

Semi-supervised time domain target speaker extraction with attention

Z Wang, R Giri, S Venkataramani, U Isik… - arXiv preprint arXiv …, 2022 - arxiv.org

In this work, we propose Exformer, a time-domain architecture for target speaker extraction. It
consists of a pre-trained speaker embedder network and a separator network based on …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

A systematic comparison of phonetic aware techniques for speech enhancement

O Tal, M Mandel, F Kreuk, Y Adi - arXiv preprint arXiv:2206.11000, 2022 - arxiv.org

Speech enhancement has seen great improvement in recent years using end-to-end neural
networks. However, most models are agnostic to the spoken phonetic content. Recently …

被引用次数：5 相关文章所有 6 个版本

[PDF] arxiv.org

On monoaural speech enhancement for automatic recognition of real noisy speech using mixture invariant training

J Zhang, C Zorila, R Doddipatla, J Barker - arXiv preprint arXiv …, 2022 - arxiv.org

In this paper, we explore an improved framework to train a monoaural neural enhancement
model for robust speech recognition. The designed training framework extends the existing …

被引用次数：6 相关文章所有 5 个版本

[PDF] arxiv.org

Mixcycle: unsupervised speech separation via cyclic mixture permutation invariant training

E Karamatlı, S Kırbız - IEEE Signal Processing Letters, 2022 - ieeexplore.ieee.org

We introduce two unsupervised source separation methods, which involve self-supervised
training from single-channel two-source speech mixtures. Our first method, mixture …

被引用次数：4 相关文章所有 5 个版本

[PDF] isca-archive.org

[PDF][PDF] Remixing-based Unsupervised Source Separation from Scratch

K Saijo, T Ogawa - arXiv preprint arXiv:2309.00376, 2023 - isca-archive.org

We propose an unsupervised approach for training separation models from scratch using
RemixIT and Self-Remixing, which are recently proposed self-supervised learning methods …