Single-channel speech separation using sparse non-negative matrix factorization.

Y Wei, D Hu, Y Tian, X Li - arXiv preprint arXiv:2208.09579, 2022 - arxiv.org

Sight and hearing are two senses that play a vital role in human communication and scene
understanding. To mimic human perception ability, audio-visual learning, aimed at …

被引用次数：53 相关文章所有 2 个版本

[PDF] arxiv.org

Dual-path transformer network: Direct context-aware modeling for end-to-end monaural speech separation

J Chen, Q Mao, D Liu - arXiv preprint arXiv:2007.13975, 2020 - arxiv.org

The dominant speech separation models are based on complex recurrent or convolution
neural network that model speech sequences indirectly conditioning on context, such as …

被引用次数：309 相关文章所有 8 个版本

[PDF] hep.com.cn

Past review, current progress, and challenges ahead on the cocktail party problem

Y Qian, C Weng, X Chang, S Wang, D Yu - Frontiers of Information …, 2018 - Springer

The cocktail party problem, ie, tracing and recognizing the speech of a specific speaker
when multiple speakers talk simultaneously, is one of the critical problems yet to be solved …

被引用次数：96 相关文章所有 6 个版本

Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks

M Kolbæk, D Yu, ZH Tan… - IEEE/ACM Transactions on …, 2017 - ieeexplore.ieee.org

In this paper, we propose the utterance-level permutation invariant training (uPIT) technique.
uPIT is a practically applicable, end-to-end, deep-learning-based solution for speaker …

被引用次数：902 相关文章所有 6 个版本

[PDF] arxiv.org

Permutation invariant training of deep models for speaker-independent multi-talker speech separation

D Yu, M Kolbæk, ZH Tan… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org

We propose a novel deep learning training criterion, named permutation invariant training
(PIT), for speaker independent multi-talker speech separation, commonly known as the …

被引用次数：974 相关文章所有 9 个版本

[PDF] arxiv.org

Deep clustering: Discriminative embeddings for segmentation and separation

JR Hershey, Z Chen, J Le Roux… - 2016 IEEE international …, 2016 - ieeexplore.ieee.org

We address the problem of" cocktail-party" source separation in a deep learning framework
called deep clustering. Previous deep network approaches to separation have shown …

被引用次数：1557 相关文章所有 17 个版本

[PDF] neurips.cc

Unsupervised sound separation using mixture invariant training

S Wisdom, E Tzinis, H Erdogan… - Advances in neural …, 2020 - proceedings.neurips.cc

In recent years, rapid progress has been made on the problem of single-channel sound
separation using supervised training of deep neural networks. In such supervised …

被引用次数：196 相关文章所有 9 个版本

[PDF] ieee.org

Spex: Multi-scale time domain speaker extraction network

C Xu, W Rao, ES Chng, H Li - IEEE/ACM transactions on audio …, 2020 - ieeexplore.ieee.org

Speaker extraction aims to mimic humans' selective auditory attention by extracting a target
speaker's voice from a multi-talker environment. It is common to perform the extraction in …

被引用次数：169 相关文章所有 6 个版本

[PDF] neurips.cc

Energy disaggregation via discriminative sparse coding

J Kolter, S Batra, A Ng - Advances in neural information …, 2010 - proceedings.neurips.cc

Energy disaggregation is the task of taking a whole-home energy signal and separating it
into its component appliances. Studies have shown that having device-level energy …

被引用次数：493 相关文章所有 16 个版本

[PDF] uni-augsburg.de

Paralinguistics in speech and language—state-of-the-art and the challenge

B Schuller, S Steidl, A Batliner, F Burkhardt… - Computer Speech & …, 2013 - Elsevier

Paralinguistic analysis is increasingly turning into a mainstream topic in speech and
language processing. This article aims to provide a broad overview of the constantly …

被引用次数：406 相关文章所有 14 个版本