Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives

K Grauman, A Westbury, L Torresani… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract We present Ego-Exo4D a diverse large-scale multimodal multiview video dataset
and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric …

Aria digital twin: A new benchmark dataset for egocentric 3d machine perception

X Pan, N Charron, Y Yang, S Peters… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract We introduce the Aria Digital Twin (ADT)-an egocentric dataset captured using Aria
glasses with extensive object, environment, and human level ground truth. This ADT release …

[HTML][HTML] An outlook into the future of egocentric vision

C Plizzari, G Goletto, A Furnari, S Bansal… - International Journal of …, 2024 - Springer
What will the future be? We wonder! In this survey, we explore the gap between current
research in egocentric vision and the ever-anticipated future, where wearable computing …

Self-supervised object detection from egocentric videos

P Akiva, J Huang, KJ Liang, R Kovvuri… - Proceedings of the …, 2023 - openaccess.thecvf.com
Understanding the visual world from the perspective of humans (egocentric) has been a
long-standing challenge in computer vision. Egocentric videos exhibit high scene complexity …

Instance tracking in 3D scenes from egocentric videos

Y Zhao, H Ma, S Kong… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Egocentric sensors such as AR/VR devices capture human-object interactions and offer the
potential to provide task-assistance by recalling 3D locations of objects of interest in the …

Object Reprojection Error (ORE): Camera pose benchmarks from lightweight tracking annotations

X Chen, W Wang, H Tang… - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract 3D spatial understanding is highly valuable in the context of semantic modeling of
environments, agents, and their relationships. Semantic modeling approaches employed on …

Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives

T Nguyen, Y Bin, J Xiao, L Qu, Y Li, JZ Wu… - arXiv preprint arXiv …, 2024 - arxiv.org
Humans use multiple senses to comprehend the environment. Vision and language are two
of the most vital senses since they allow us to easily communicate our thoughts and …

ZJU ReLER Submission for EPIC-KITCHEN Challenge 2023: TREK-150 Single Object Tracking

Y Xu, J Li, Z Yang, Y Yang, Y Zhuang - arXiv preprint arXiv:2307.02508, 2023 - arxiv.org
The Associating Objects with Transformers (AOT) framework has exhibited exceptional
performance in a wide range of complex scenarios for video object tracking and …