An outlook into the future of egocentric vision

C Plizzari, G Goletto, A Furnari, S Bansal… - International Journal of …, 2024 - Springer
What will the future be? We wonder! In this survey, we explore the gap between current
research in egocentric vision and the ever-anticipated future, where wearable computing …

Equibot: Sim (3)-equivariant diffusion policy for generalizable and data efficient learning

J Yang, Z Cao, C Deng, R Antonova, S Song… - arXiv preprint arXiv …, 2024 - arxiv.org
Building effective imitation learning methods that enable robots to learn from limited data
and still generalize across diverse real-world environments is a long-standing problem in …

Okami: Teaching humanoid robots manipulation skills through single video imitation

J Li, Y Zhu, Y Xie, Z Jiang, M Seo… - … Annual Conference on …, 2024 - openreview.net
We study the problem of teaching humanoid robots manipulation skills by imitating from
single video demonstrations. We introduce OKAMI, a method that generates a manipulation …

Diffh2o: Diffusion-based synthesis of hand-object interactions from textual descriptions

S Christen, S Hampali, F Sener, E Remelli… - SIGGRAPH Asia 2024 …, 2024 - dl.acm.org
We introduce DiffH2O, a new diffusion-based framework for synthesizing realistic, dexterous
hand-object interactions from natural language. Our model employs a temporal two-stage …

Dexcap: Scalable and portable mocap data collection system for dexterous manipulation

C Wang, H Shi, W Wang, R Zhang, L Fei-Fei… - arXiv preprint arXiv …, 2024 - arxiv.org
Imitation learning from human hand motion data presents a promising avenue for imbuing
robots with human-like dexterity in real-world manipulation tasks. Despite this potential …

3D hand pose estimation in everyday egocentric images

A Prakash, R Tu, M Chang, S Gupta - European Conference on Computer …, 2025 - Springer
Abstract 3D hand pose estimation in everyday egocentric images is challenging for several
reasons: poor visual signal (occlusion from the object of interaction, low resolution & motion …

Robot see robot do: Imitating articulated object manipulation with monocular 4d reconstruction

J Kerr, CM Kim, M Wu, B Yi, Q Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Humans can learn to manipulate new objects by simply watching others; providing robots
with the ability to learn from such demonstrations would enable a natural interface specifying …

Hamba: Single-view 3d hand reconstruction with graph-guided bi-scanning mamba

H Dong, A Chharia, W Gou, FV Carrasco… - arXiv preprint arXiv …, 2024 - arxiv.org
3D Hand reconstruction from a single RGB image is challenging due to the articulated
motion, self-occlusion, and interaction with objects. Existing SOTA methods employ attention …

Dense Hand-Object (HO) GraspNet with Full Grasping Taxonomy and Dynamics

W Cho, J Lee, M Yi, M Kim, T Woo, D Kim, T Ha… - … on Computer Vision, 2025 - Springer
Existing datasets for 3D hand-object interaction are limited either in the data cardinality, data
variations in interaction scenarios, or the quality of annotations. In this work, we present a …

WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild

RA Potamias, J Zhang, J Deng, S Zafeiriou - arXiv preprint arXiv …, 2024 - arxiv.org
In recent years, 3D hand pose estimation methods have garnered significant attention due to
their extensive applications in human-computer interaction, virtual reality, and robotics. In …