A comprehensive review of polyphonic sound event detection

TK Chan, CS Chin - IEEE Access, 2020 - ieeexplore.ieee.org
One of the most amazing functions of the human auditory system is the ability to detect all
kinds of sound events in the environment. With the technologies and hardware advances …

Unsupervised sound separation using mixture invariant training

S Wisdom, E Tzinis, H Erdogan… - Advances in neural …, 2020 - proceedings.neurips.cc
In recent years, rapid progress has been made on the problem of single-channel sound
separation using supervised training of deep neural networks. In such supervised …

Sudo rm-rf: Efficient networks for universal audio source separation

E Tzinis, Z Wang, P Smaragdis - 2020 IEEE 30th International …, 2020 - ieeexplore.ieee.org
In this paper, we present an efficient neural network for end-to-end general purpose audio
source separation. Specifically, the backbone structure of this convolutional network is the …

Far-field automatic speech recognition

R Haeb-Umbach, J Heymann, L Drude… - Proceedings of the …, 2020 - ieeexplore.ieee.org
The machine recognition of speech spoken at a distance from the microphones, known as
far-field automatic speech recognition (ASR), has received a significant increase in attention …

What's all the fuss about free universal sound separation data?

S Wisdom, H Erdogan, DPW Ellis… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
We introduce the Free Universal Sound Separation (FUSS) dataset, a new corpus for
experiments in separating mixtures of an unknown number of sounds from an open domain …

Audioscopev2: Audio-visual attention architectures for calibrated open-domain on-screen sound separation

E Tzinis, S Wisdom, T Remez, JR Hershey - European Conference on …, 2022 - Springer
We introduce AudioScopeV2, a state-of-the-art universal audio-visual on-screen sound
separation system which is capable of learning to separate sounds and associate them with …

Weakly-supervised audio-visual segmentation

S Mo, B Raj - Advances in Neural Information Processing …, 2024 - proceedings.neurips.cc
Audio-visual segmentation is a challenging task that aims to predict pixel-level masks for
sound sources in a video. Previous work applied a comprehensive manually designed …

Into the wild with audioscope: Unsupervised audio-visual separation of on-screen sounds

E Tzinis, S Wisdom, A Jansen, S Hershey… - arXiv preprint arXiv …, 2020 - arxiv.org
Recent progress in deep learning has enabled many advances in sound separation and
visual scene understanding. However, extracting sound sources which are apparent in …

Separate what you describe: Language-queried audio source separation

X Liu, H Liu, Q Kong, X Mei, J Zhao, Q Huang… - arXiv preprint arXiv …, 2022 - arxiv.org
In this paper, we introduce the task of language-queried audio source separation (LASS),
which aims to separate a target source from an audio mixture based on a natural language …

Move2hear: Active audio-visual source separation

S Majumder, Z Al-Halah… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
We introduce the active audio-visual source separation problem, where an agent must move
intelligently in order to better isolate the sounds coming from an object of interest in its …