Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives

K Grauman, A Westbury, L Torresani… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract We present Ego-Exo4D a diverse large-scale multimodal multiview video dataset
and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric …

Towards generalizable zero-shot manipulation via translating human interaction plans

H Bharadhwaj, A Gupta, V Kumar… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
We pursue the goal of developing robots that can interact zero-shot with generic unseen
objects via a diverse repertoire of manipulation skills and show how passive human videos …

Multimodal distillation for egocentric action recognition

G Radevski, D Grujicic, M Blaschko… - Proceedings of the …, 2023 - openaccess.thecvf.com
The focal point of egocentric video understanding is modelling hand-object interactions.
Standard models, eg CNNs or Vision Transformers, which receive RGB frames as input …

An outlook into the future of egocentric vision

C Plizzari, G Goletto, A Furnari, S Bansal… - International Journal of …, 2024 - Springer
What will the future be? We wonder! In this survey, we explore the gap between current
research in egocentric vision and the ever-anticipated future, where wearable computing …

Self-supervised visual learning from interactions with objects

A Aubret, C Teulière, J Triesch - European Conference on Computer …, 2025 - Springer
Self-supervised learning (SSL) has revolutionized visual representation learning, but has
not achieved the robustness of human vision. A reason for this could be that SSL does not …

Epic fields: Marrying 3d geometry and video understanding

V Tschernezki, A Darkhalil, Z Zhu… - Advances in …, 2024 - proceedings.neurips.cc
Neural rendering is fuelling a unification of learning, 3D geometry and video understanding
that has been waiting for more than two decades. Progress, however, is still hampered by a …

Action-slot: Visual action-centric representations for multi-label atomic activity recognition in traffic scenes

CH Kung, SW Lu, YH Tsai… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
In this paper we study multi-label atomic activity recognition. Despite the notable progress in
action recognition it is still challenging to recognize atomic activities due to a deficiency in …

Multimodal cross-domain few-shot learning for egocentric action recognition

M Hatano, R Hachiuma, R Fujii, H Saito - European Conference on …, 2025 - Springer
We address a novel cross-domain few-shot learning task (CD-FSL) with multimodal input
and unlabeled target data for egocentric action recognition. This paper simultaneously …

Spherical World-Locking for Audio-Visual Localization in Egocentric Videos

H Yun, R Gao, I Ananthabhotla, A Kumar… - … on Computer Vision, 2025 - Springer
Egocentric videos provide comprehensive contexts for user and scene understanding,
spanning multisensory perception to behavioral interaction. We propose Spherical World …

Beyond" Taming Electric Scooters": Disentangling Understandings of Micromobility Naturalistic Riding

M Tabatabaie, S He, H Wang, KG Shin - Proceedings of the ACM on …, 2024 - dl.acm.org
Electric (e)-scooters have emerged as a popular, ubiquitous, and first/last-mile micromobility
transportation option within and across many cities worldwide. With the increasing situation …