Learning in audio-visual context: A review, analysis, and new perspective

Y Wei, D Hu, Y Tian, X Li - arXiv preprint arXiv:2208.09579, 2022 - arxiv.org
Sight and hearing are two senses that play a vital role in human communication and scene
understanding. To mimic human perception ability, audio-visual learning, aimed at …

Dual-path transformer network: Direct context-aware modeling for end-to-end monaural speech separation

J Chen, Q Mao, D Liu - arXiv preprint arXiv:2007.13975, 2020 - arxiv.org
The dominant speech separation models are based on complex recurrent or convolution
neural network that model speech sequences indirectly conditioning on context, such as …

Past review, current progress, and challenges ahead on the cocktail party problem

Y Qian, C Weng, X Chang, S Wang, D Yu - Frontiers of Information …, 2018 - Springer
The cocktail party problem, ie, tracing and recognizing the speech of a specific speaker
when multiple speakers talk simultaneously, is one of the critical problems yet to be solved …

Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks

M Kolbæk, D Yu, ZH Tan… - IEEE/ACM Transactions on …, 2017 - ieeexplore.ieee.org
In this paper, we propose the utterance-level permutation invariant training (uPIT) technique.
uPIT is a practically applicable, end-to-end, deep-learning-based solution for speaker …

Permutation invariant training of deep models for speaker-independent multi-talker speech separation

D Yu, M Kolbæk, ZH Tan… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
We propose a novel deep learning training criterion, named permutation invariant training
(PIT), for speaker independent multi-talker speech separation, commonly known as the …

Deep clustering: Discriminative embeddings for segmentation and separation

JR Hershey, Z Chen, J Le Roux… - 2016 IEEE international …, 2016 - ieeexplore.ieee.org
We address the problem of" cocktail-party" source separation in a deep learning framework
called deep clustering. Previous deep network approaches to separation have shown …

Unsupervised sound separation using mixture invariant training

S Wisdom, E Tzinis, H Erdogan… - Advances in neural …, 2020 - proceedings.neurips.cc
In recent years, rapid progress has been made on the problem of single-channel sound
separation using supervised training of deep neural networks. In such supervised …

Spex: Multi-scale time domain speaker extraction network

C Xu, W Rao, ES Chng, H Li - IEEE/ACM transactions on audio …, 2020 - ieeexplore.ieee.org
Speaker extraction aims to mimic humans' selective auditory attention by extracting a target
speaker's voice from a multi-talker environment. It is common to perform the extraction in …

Energy disaggregation via discriminative sparse coding

J Kolter, S Batra, A Ng - Advances in neural information …, 2010 - proceedings.neurips.cc
Energy disaggregation is the task of taking a whole-home energy signal and separating it
into its component appliances. Studies have shown that having device-level energy …

Paralinguistics in speech and language—state-of-the-art and the challenge

B Schuller, S Steidl, A Batliner, F Burkhardt… - Computer Speech & …, 2013 - Elsevier
Paralinguistic analysis is increasingly turning into a mainstream topic in speech and
language processing. This article aims to provide a broad overview of the constantly …