An overview of machine learning and other data-based methods for spatial audio capture, processing, and reproduction

M Cobos, J Ahrens, K Kowalczyk, A Politis - EURASIP Journal on Audio …, 2022 - Springer
The domain of spatial audio comprises methods for capturing, processing, and reproducing
audio content that contains spatial information. Data-based methods are those that operate …

Multi-speaker tracking from an audio–visual sensing device

X Qian, A Brutti, O Lanz, M Omologo… - IEEE Transactions on …, 2019 - ieeexplore.ieee.org
Compact multi-sensor platforms are portable and thus desirable for robotics and personal-
assistance tasks. However, compared to physically distributed sensors, the size of these …

Translation of a higher order ambisonics sound scene based on parametric decomposition

M Kentgens, A Behler, P Jax - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org
This paper presents a novel 3DoF+ system that allows to navigate, ie, change position, in
scene-based spatial audio content beyond the sweet spot of a Higher Order Ambisonics …

DMMAN: A two-stage audio–visual fusion framework for sound separation and event localization

R Hu, S Zhou, ZR Tang, S Chang, Q Huang, Y Liu… - Neural Networks, 2021 - Elsevier
Videos are used widely as the media platforms for human beings to touch the physical
change of the world. However, we always receive the mixed sound from the multiple sound …

Towards generating ambisonics using audio-visual cue for virtual reality

A Rana, C Ozcinar, A Smolic - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org
Ambisonics ie, a full-sphere surround sound, is quintessential with 360° visual content to
provide a realistic virtual reality (VR) experience. While 360° visual content capture gained a …

Qualitative evaluation of media device orchestration for immersive spatial audio reproduction

J Francombe, J Woodcock, RJ Hughes… - Journal of the Audio …, 2018 - eprints.soton.ac.uk
The challenge of installing and setting up dedicated spatial audio systems can make it
difficult to deliver immersive listening experiences to the general public. However, the …

Audio-visual speaker tracking: Progress, challenges, and future directions

J Zhao, Y Xu, X Qian, D Berghi, P Wu, M Cui… - arXiv preprint arXiv …, 2023 - arxiv.org
Audio-visual speaker tracking has drawn increasing attention over the past few years due to
its academic values and wide application. Audio and visual modalities can provide …

Tragic Talkers: A Shakespearean sound-and light-field dataset for audio-visual machine learning research

D Berghi, M Volino, PJB Jackson - Proceedings of the 19th ACM …, 2022 - dl.acm.org
3D audio-visual production aims to deliver immersive and interactive experiences to the
consumer. Yet, faithfully reproducing real-world 3D scenes remains a challenging task. This …

Real-time low-latency music source separation using Hybrid spectrogram-TasNet

S Venkatesh, A Benilov, P Coleman… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
There have been significant advances in deep learning for music demixing in recent years.
However, there has been little attention given to how these neural networks can be adapted …

Visr—a versatile open software framework for audio signal processing

A Franck, FM Fazi - Audio Engineering Society Conference: 2018 AES …, 2018 - aes.org
Software plays an increasingly important role in spatial and object-based audio. Realtime
and interactive rendering is often needed to subjectively evaluate and demonstrate …