Soundspaces: Audio-visual navigation in 3d environments

C Chen, U Jain, C Schissler, SVA Gari… - Computer Vision–ECCV …, 2020 - Springer
Moving around in the world is naturally a multisensory experience, but today's embodied
agents are deaf—restricted to solely their visual perception of the environment. We introduce …

Audio to body dynamics

E Shlizerman, L Dery, H Schoen… - Proceedings of the …, 2018 - openaccess.thecvf.com
We present a method that gets as input an audio of violin or piano playing, and outputs a
video of skeleton predictions which are further used to animate an avatar. The key idea is to …

Multi-speaker tracking from an audio–visual sensing device

X Qian, A Brutti, O Lanz, M Omologo… - IEEE Transactions on …, 2019 - ieeexplore.ieee.org
Compact multi-sensor platforms are portable and thus desirable for robotics and personal-
assistance tasks. However, compared to physically distributed sensors, the size of these …

[PDF][PDF] Deepmot: A differentiable framework for training multiple object trackers

Y Xu, Y Ban, X Alameda-Pineda… - arXiv preprint arXiv …, 2019 - xavirema.eu
Abstract Multiple Object Tracking accuracy and precision (MOTA and MOTP) are two
standard and widely-used metrics to assess the quality of multiple object trackers. They are …

Variational bayesian inference for audio-visual tracking of multiple speakers

Y Ban, X Alameda-Pineda, L Girin… - IEEE transactions on …, 2019 - ieeexplore.ieee.org
In this article, we address the problem of tracking multiple speakers via the fusion of visual
and auditory information. We propose to exploit the complementary nature and roles of …

Audio-visual tracking of concurrent speakers

X Qian, A Brutti, O Lanz, M Omologo… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Audio-visual tracking of an unknown number of concurrent speakers in 3D is a challenging
task, especially when sound and video are collected with a compact sensing platform. In this …

Online localization and tracking of multiple moving speakers in reverberant environments

X Li, Y Ban, L Girin, X Alameda-Pineda… - IEEE Journal of …, 2019 - ieeexplore.ieee.org
We address the problem of online localization and tracking of multiple moving speakers in
reverberant environments. This paper has the following contributions. We use the direct-path …

A deep concept graph network for interaction-aware trajectory prediction

Y Ban, X Li, G Rosman, I Gilitschenski… - … on Robotics and …, 2022 - ieeexplore.ieee.org
Temporal patterns (how vehicles behave in our observed past) underline our reasoning of
how people drive on the road, and can explain why we make certain predictions about …

[PDF][PDF] Audio-visual embodied navigation

C Chen, U Jain, C Schissler, SVA Gari, Z Al-Halah… - …, 2019 - researchgate.net
Moving around in the world is naturally a multisensory experience, but today's embodied
agents are deaf—restricted to solely their visual perception of the environment. We introduce …

Self-supervised neural audio-visual sound source localization via probabilistic spatial modeling

Y Masuyama, Y Bando, K Yatabe… - 2020 IEEE/RSJ …, 2020 - ieeexplore.ieee.org
Detecting sound source objects within visual observation is important for autonomous robots
to comprehend surrounding environments. Since sounding objects have a large variety with …