[HTML][HTML] Audio self-supervised learning: A survey

S Liu, A Mallol-Ragolta, E Parada-Cabaleiro, K Qian… - Patterns, 2022 - cell.com
Similar to humans' cognitive ability to generalize knowledge and skills, self-supervised
learning (SSL) targets discovering general representations from large-scale data. This …

[HTML][HTML] Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

P Ochieng - Artificial Intelligence Review, 2023 - Springer
Deep neural networks (DNN) techniques have become pervasive in domains such as
natural language processing and computer vision. They have achieved great success in …

Mix and localize: Localizing sound sources in mixtures

X Hu, Z Chen, A Owens - … of the IEEE/CVF Conference on …, 2022 - openaccess.thecvf.com
We present a method for simultaneously localizing multiple sound sources within a visual
scene. This task requires a model to both group a sound mixture into individual sources, and …

What's all the fuss about free universal sound separation data?

S Wisdom, H Erdogan, DPW Ellis… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
We introduce the Free Universal Sound Separation (FUSS) dataset, a new corpus for
experiments in separating mixtures of an unknown number of sounds from an open domain …

Remixit: Continual self-training of speech enhancement models via bootstrapped remixing

E Tzinis, Y Adi, VK Ithapu, B Xu… - IEEE Journal of …, 2022 - ieeexplore.ieee.org
We present RemixIT, a simple yet effective self-supervised method for training speech
enhancement without the need of a single isolated in-domain speech nor a noise waveform …

ESPnet-SE: End-to-end speech enhancement and separation toolkit designed for ASR integration

C Li, J Shi, W Zhang, AS Subramanian… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org
We present ESPnet-SE, which is designed for the quick development of speech
enhancement and speech separation systems in a single framework, along with the optional …

Audioscopev2: Audio-visual attention architectures for calibrated open-domain on-screen sound separation

E Tzinis, S Wisdom, T Remez, JR Hershey - European Conference on …, 2022 - Springer
We introduce AudioScopeV2, a state-of-the-art universal audio-visual on-screen sound
separation system which is capable of learning to separate sounds and associate them with …

Domain‐specific neural networks improve automated bird sound recognition already with small amount of local data

P Lauha, P Somervuo, P Lehikoinen… - Methods in Ecology …, 2022 - Wiley Online Library
An automatic bird sound recognition system is a useful tool for collecting data of different
bird species for ecological analysis. Together with autonomous recording units (ARUs), such …

Into the wild with audioscope: Unsupervised audio-visual separation of on-screen sounds

E Tzinis, S Wisdom, A Jansen, S Hershey… - arXiv preprint arXiv …, 2020 - arxiv.org
Recent progress in deep learning has enabled many advances in sound separation and
visual scene understanding. However, extracting sound sources which are apparent in …

Improving bird classification with unsupervised sound separation

T Denton, S Wisdom, JR Hershey - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
This paper addresses the problem of species classification in bird song recordings. The
massive amount of available field recordings of birds presents an opportunity to use …