Decoupling magnitude and phase estimation with deep resunet for music source separation

Q Kong, Y Cao, H Liu, K Choi, Y Wang - arXiv preprint arXiv:2109.05418, 2021 - arxiv.org
Deep neural network based methods have been successfully applied to music source
separation. They typically learn a mapping from a mixture spectrogram to a set of source …

DPT-FSNet: Dual-path transformer based full-band and sub-band fusion network for speech enhancement

F Dang, H Chen, P Zhang - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Sub-band models have achieved promising results due to their ability to model local
patterns in the spectrogram. Some studies further improve the performance by fusing sub …

Dccrn+: Channel-wise subband dccrn with snr estimation for speech enhancement

S Lv, Y Hu, S Zhang, L Xie - arXiv preprint arXiv:2106.08672, 2021 - arxiv.org
Deep complex convolution recurrent network (DCCRN), which extends CRN with complex
structure, has achieved superior performance in MOS evaluation in Interspeech 2020 deep …

Separate what you describe: Language-queried audio source separation

X Liu, H Liu, Q Kong, X Mei, J Zhao, Q Huang… - arXiv preprint arXiv …, 2022 - arxiv.org
In this paper, we introduce the task of language-queried audio source separation (LASS),
which aims to separate a target source from an audio mixture based on a natural language …

Tea-pse 2.0: Sub-band network for real-time personalized speech enhancement

Y Ju, S Zhang, W Rao, Y Wang, T Yu… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
Personalized speech enhancement (PSE) utilizes additional cues like speaker embeddings
to remove background noise and interfering speech and extract the speech from target …

VoiceFixer: Toward general speech restoration with neural vocoder

H Liu, Q Kong, Q Tian, Y Zhao, DL Wang… - arXiv preprint arXiv …, 2021 - arxiv.org
Speech restoration aims to remove distortions in speech signals. Prior methods mainly focus
on single-task speech restoration (SSR), such as speech denoising or speech declipping …

The Sound Demixing Challenge 2023$\unicode {x2013} $ Music Demixing Track

G Fabbro, S Uhlich, CH Lai, W Choi… - arXiv preprint arXiv …, 2023 - arxiv.org
This paper summarizes the music demixing (MDX) track of the Sound Demixing Challenge
(SDX'23). We provide a summary of the challenge setup and introduce the task of robust …

CWS-PResUNet: Music source separation with channel-wise subband phase-aware resunet

H Liu, Q Kong, J Liu - arXiv preprint arXiv:2112.04685, 2021 - arxiv.org
Music source separation (MSS) shows active progress with deep learning models in recent
years. Many MSS models perform separations on spectrograms by estimating bounded ratio …

Target sound extraction with variable cross-modality clues

C Li, Y Qian, Z Chen, D Wang… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Automatic target sound extraction (TSE) is a machine learning approach to mimic the human
auditory perception capability of attending to a sound source of interest from a mixture of …

Surrey system for dcase 2022 task 5: Few-shot bioacoustic event detection with segment-level metric learning

H Liu, X Liu, X Mei, Q Kong, W Wang… - arXiv preprint arXiv …, 2022 - arxiv.org
Few-shot audio event detection is a task that detects the occurrence time of a novel sound
class given a few examples. In this work, we propose a system based on segment-level …