Hybrid spectrogram and waveform source separation

K Shen, Z Ju, X Tan, Y Liu, Y Leng, L He, T Qin… - arXiv preprint arXiv …, 2023 - arxiv.org

Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wild datasets is
important to capture the diversity in human speech such as speaker identities, prosodies …

被引用次数：103 相关文章所有 3 个版本

[PDF] arxiv.org

Hybrid transformers for music source separation

S Rouard, F Massa, A Défossez - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org

A natural question arising in Music Source Separation (MSS) is whether long range
contextual information is useful, or whether local acoustic features are sufficient. In other …

被引用次数：93 相关文章所有 3 个版本

[PDF] arxiv.org

Music source separation with band-split RNN

Y Luo, J Yu - IEEE/ACM Transactions on Audio, Speech, and …, 2023 - ieeexplore.ieee.org

The performance of music source separation (MSS) models has been greatly improved in
recent years thanks to the development of novel neural network architectures and training …

被引用次数：64 相关文章所有 4 个版本

[PDF] frontiersin.org

Music demixing challenge 2021

Y Mitsufuji, G Fabbro, S Uhlich, FR Stöter… - Frontiers in Signal …, 2022 - frontiersin.org

Music source separation has been intensively studied in the last decade and tremendous
progress with the advent of deep learning could be observed. Evaluation campaigns such …

被引用次数：74 相关文章所有 5 个版本

[PDF] arxiv.org

Multi-source diffusion models for simultaneous music generation and separation

G Mariani, I Tallini, E Postolache, M Mancusi… - arXiv preprint arXiv …, 2023 - arxiv.org

In this work, we define a diffusion-based generative model capable of both music synthesis
and source separation by learning the score of the joint probability density of sources …

被引用次数：19 相关文章所有 3 个版本

[PDF] arxiv.org

Towards low-distortion multi-channel speech enhancement: The ESPNet-SE submission to the L3DAS22 challenge

YJ Lu, S Cornell, X Chang, W Zhang… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

This paper describes our submission to the L3DAS22 Challenge Task 1, which consists of
speech enhancement with 3D Ambisonic microphones. The core of our approach combines …

被引用次数：24 相关文章所有 5 个版本

[PDF] arxiv.org

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

J Hwang, M Hira, C Chen, X Zhang, Z Ni… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

TorchAudio is an open-source audio and speech processing library built for PyTorch. It aims
to accelerate the research and development of audio and speech technologies by providing …

被引用次数：6 相关文章所有 5 个版本

[PDF] arxiv.org

Transfer learning of wav2vec 2.0 for automatic lyric transcription

L Ou, X Gu, Y Wang - arXiv preprint arXiv:2207.09747, 2022 - arxiv.org

Automatic speech recognition (ASR) has progressed significantly in recent years due to the
emergence of large-scale datasets and the self-supervised learning (SSL) paradigm …

被引用次数：20 相关文章所有 8 个版本

[PDF] arxiv.org

The Sound Demixing Challenge 2023$\unicode {x2013} $ Music Demixing Track

G Fabbro, S Uhlich, CH Lai, W Choi… - arXiv preprint arXiv …, 2023 - arxiv.org

This paper summarizes the music demixing (MDX) track of the Sound Demixing Challenge
(SDX'23). We provide a summary of the challenge setup and introduce the task of robust …

被引用次数：9 相关文章所有 6 个版本

[PDF] arxiv.org

Aero: Audio super resolution in the spectral domain

M Mandel, O Tal, Y Adi - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org

We present AERO, a audio super-resolution model that processes speech and music
signals in the spectral domain. AERO is based on an encoder-decoder architecture with …

被引用次数：15 相关文章所有 4 个版本