Egodistill: Egocentric head motion distillation for efficient video understanding

K Grauman, A Westbury, L Torresani… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract We present Ego-Exo4D a diverse large-scale multimodal multiview video dataset
and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric …

被引用次数：116 相关文章所有 5 个版本

[PDF] arxiv.org

Towards generalizable zero-shot manipulation via translating human interaction plans

H Bharadhwaj, A Gupta, V Kumar… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org

We pursue the goal of developing robots that can interact zero-shot with generic unseen
objects via a diverse repertoire of manipulation skills and show how passive human videos …

被引用次数：27 相关文章所有 2 个版本

[PDF] thecvf.com

Multimodal distillation for egocentric action recognition

G Radevski, D Grujicic, M Blaschko… - Proceedings of the …, 2023 - openaccess.thecvf.com

The focal point of egocentric video understanding is modelling hand-object interactions.
Standard models, eg CNNs or Vision Transformers, which receive RGB frames as input …

被引用次数：18 相关文章所有 7 个版本

[PDF] springer.com

An outlook into the future of egocentric vision

C Plizzari, G Goletto, A Furnari, S Bansal… - International Journal of …, 2024 - Springer

What will the future be? We wonder! In this survey, we explore the gap between current
research in egocentric vision and the ever-anticipated future, where wearable computing …

被引用次数：34 相关文章所有 7 个版本

[PDF] arxiv.org

Self-supervised visual learning from interactions with objects

A Aubret, C Teulière, J Triesch - European Conference on Computer …, 2025 - Springer

Self-supervised learning (SSL) has revolutionized visual representation learning, but has
not achieved the robustness of human vision. A reason for this could be that SSL does not …

被引用次数：6 相关文章所有 7 个版本

[PDF] neurips.cc

Epic fields: Marrying 3d geometry and video understanding

V Tschernezki, A Darkhalil, Z Zhu… - Advances in …, 2024 - proceedings.neurips.cc

Neural rendering is fuelling a unification of learning, 3D geometry and video understanding
that has been waiting for more than two decades. Progress, however, is still hampered by a …

被引用次数：19 相关文章所有 13 个版本

[PDF] thecvf.com

Action-slot: Visual action-centric representations for multi-label atomic activity recognition in traffic scenes

CH Kung, SW Lu, YH Tsai… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

In this paper we study multi-label atomic activity recognition. Despite the notable progress in
action recognition it is still challenging to recognize atomic activities due to a deficiency in …

被引用次数：3 相关文章所有 4 个版本

[PDF] arxiv.org

Multimodal cross-domain few-shot learning for egocentric action recognition

M Hatano, R Hachiuma, R Fujii, H Saito - European Conference on …, 2025 - Springer

We address a novel cross-domain few-shot learning task (CD-FSL) with multimodal input
and unlabeled target data for egocentric action recognition. This paper simultaneously …

被引用次数：2 相关文章所有 4 个版本

[PDF] arxiv.org

Spherical World-Locking for Audio-Visual Localization in Egocentric Videos

H Yun, R Gao, I Ananthabhotla, A Kumar… - … on Computer Vision, 2025 - Springer

Egocentric videos provide comprehensive contexts for user and scene understanding,
spanning multisensory perception to behavioral interaction. We propose Spherical World …

被引用次数：1 相关文章所有 7 个版本

Beyond" Taming Electric Scooters": Disentangling Understandings of Micromobility Naturalistic Riding

M Tabatabaie, S He, H Wang, KG Shin - Proceedings of the ACM on …, 2024 - dl.acm.org

Electric (e)-scooters have emerged as a popular, ubiquitous, and first/last-mile micromobility
transportation option within and across many cities worldwide. With the increasing situation …

被引用次数：1 相关文章所有 2 个版本