Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives
Abstract We present Ego-Exo4D a diverse large-scale multimodal multiview video dataset
and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric …
and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric …
Towards generalizable zero-shot manipulation via translating human interaction plans
H Bharadhwaj, A Gupta, V Kumar… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
We pursue the goal of developing robots that can interact zero-shot with generic unseen
objects via a diverse repertoire of manipulation skills and show how passive human videos …
objects via a diverse repertoire of manipulation skills and show how passive human videos …
Multimodal distillation for egocentric action recognition
The focal point of egocentric video understanding is modelling hand-object interactions.
Standard models, eg CNNs or Vision Transformers, which receive RGB frames as input …
Standard models, eg CNNs or Vision Transformers, which receive RGB frames as input …
An outlook into the future of egocentric vision
What will the future be? We wonder! In this survey, we explore the gap between current
research in egocentric vision and the ever-anticipated future, where wearable computing …
research in egocentric vision and the ever-anticipated future, where wearable computing …
Self-supervised visual learning from interactions with objects
Self-supervised learning (SSL) has revolutionized visual representation learning, but has
not achieved the robustness of human vision. A reason for this could be that SSL does not …
not achieved the robustness of human vision. A reason for this could be that SSL does not …
Epic fields: Marrying 3d geometry and video understanding
V Tschernezki, A Darkhalil, Z Zhu… - Advances in …, 2024 - proceedings.neurips.cc
Neural rendering is fuelling a unification of learning, 3D geometry and video understanding
that has been waiting for more than two decades. Progress, however, is still hampered by a …
that has been waiting for more than two decades. Progress, however, is still hampered by a …
Action-slot: Visual action-centric representations for multi-label atomic activity recognition in traffic scenes
In this paper we study multi-label atomic activity recognition. Despite the notable progress in
action recognition it is still challenging to recognize atomic activities due to a deficiency in …
action recognition it is still challenging to recognize atomic activities due to a deficiency in …
Multimodal cross-domain few-shot learning for egocentric action recognition
We address a novel cross-domain few-shot learning task (CD-FSL) with multimodal input
and unlabeled target data for egocentric action recognition. This paper simultaneously …
and unlabeled target data for egocentric action recognition. This paper simultaneously …
Spherical World-Locking for Audio-Visual Localization in Egocentric Videos
Egocentric videos provide comprehensive contexts for user and scene understanding,
spanning multisensory perception to behavioral interaction. We propose Spherical World …
spanning multisensory perception to behavioral interaction. We propose Spherical World …
Beyond" Taming Electric Scooters": Disentangling Understandings of Micromobility Naturalistic Riding
Electric (e)-scooters have emerged as a popular, ubiquitous, and first/last-mile micromobility
transportation option within and across many cities worldwide. With the increasing situation …
transportation option within and across many cities worldwide. With the increasing situation …