Soundspaces: Audio-visual navigation in 3d environments
Moving around in the world is naturally a multisensory experience, but today's embodied
agents are deaf—restricted to solely their visual perception of the environment. We introduce …
agents are deaf—restricted to solely their visual perception of the environment. We introduce …
Audio to body dynamics
E Shlizerman, L Dery, H Schoen… - Proceedings of the …, 2018 - openaccess.thecvf.com
We present a method that gets as input an audio of violin or piano playing, and outputs a
video of skeleton predictions which are further used to animate an avatar. The key idea is to …
video of skeleton predictions which are further used to animate an avatar. The key idea is to …
Multi-speaker tracking from an audio–visual sensing device
Compact multi-sensor platforms are portable and thus desirable for robotics and personal-
assistance tasks. However, compared to physically distributed sensors, the size of these …
assistance tasks. However, compared to physically distributed sensors, the size of these …
[PDF][PDF] Deepmot: A differentiable framework for training multiple object trackers
Abstract Multiple Object Tracking accuracy and precision (MOTA and MOTP) are two
standard and widely-used metrics to assess the quality of multiple object trackers. They are …
standard and widely-used metrics to assess the quality of multiple object trackers. They are …
Variational bayesian inference for audio-visual tracking of multiple speakers
In this article, we address the problem of tracking multiple speakers via the fusion of visual
and auditory information. We propose to exploit the complementary nature and roles of …
and auditory information. We propose to exploit the complementary nature and roles of …
Audio-visual tracking of concurrent speakers
Audio-visual tracking of an unknown number of concurrent speakers in 3D is a challenging
task, especially when sound and video are collected with a compact sensing platform. In this …
task, especially when sound and video are collected with a compact sensing platform. In this …
Online localization and tracking of multiple moving speakers in reverberant environments
We address the problem of online localization and tracking of multiple moving speakers in
reverberant environments. This paper has the following contributions. We use the direct-path …
reverberant environments. This paper has the following contributions. We use the direct-path …
A deep concept graph network for interaction-aware trajectory prediction
Temporal patterns (how vehicles behave in our observed past) underline our reasoning of
how people drive on the road, and can explain why we make certain predictions about …
how people drive on the road, and can explain why we make certain predictions about …
[PDF][PDF] Audio-visual embodied navigation
Moving around in the world is naturally a multisensory experience, but today's embodied
agents are deaf—restricted to solely their visual perception of the environment. We introduce …
agents are deaf—restricted to solely their visual perception of the environment. We introduce …
Self-supervised neural audio-visual sound source localization via probabilistic spatial modeling
Detecting sound source objects within visual observation is important for autonomous robots
to comprehend surrounding environments. Since sounding objects have a large variety with …
to comprehend surrounding environments. Since sounding objects have a large variety with …