Egotracks: A long-term egocentric visual object tracking dataset

K Grauman, A Westbury, L Torresani… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract We present Ego-Exo4D a diverse large-scale multimodal multiview video dataset
and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric …

被引用次数：52 相关文章所有 5 个版本

[PDF] thecvf.com

Aria digital twin: A new benchmark dataset for egocentric 3d machine perception

X Pan, N Charron, Y Yang, S Peters… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract We introduce the Aria Digital Twin (ADT)-an egocentric dataset captured using Aria
glasses with extensive object, environment, and human level ground truth. This ADT release …

被引用次数：19 相关文章所有 7 个版本

[HTML] springer.com

[HTML][HTML] An outlook into the future of egocentric vision

C Plizzari, G Goletto, A Furnari, S Bansal… - International Journal of …, 2024 - Springer

What will the future be? We wonder! In this survey, we explore the gap between current
research in egocentric vision and the ever-anticipated future, where wearable computing …

被引用次数：17 相关文章所有 7 个版本

[PDF] thecvf.com

Self-supervised object detection from egocentric videos

P Akiva, J Huang, KJ Liang, R Kovvuri… - Proceedings of the …, 2023 - openaccess.thecvf.com

Understanding the visual world from the perspective of humans (egocentric) has been a
long-standing challenge in computer vision. Egocentric videos exhibit high scene complexity …

被引用次数：3 相关文章所有 4 个版本

[PDF] thecvf.com

Instance tracking in 3D scenes from egocentric videos

Y Zhao, H Ma, S Kong… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Egocentric sensors such as AR/VR devices capture human-object interactions and offer the
potential to provide task-assistance by recalling 3D locations of objects of interest in the …

被引用次数：2 相关文章所有 3 个版本

[PDF] neurips.cc

Object Reprojection Error (ORE): Camera pose benchmarks from lightweight tracking annotations

X Chen, W Wang, H Tang… - Advances in Neural …, 2024 - proceedings.neurips.cc

Abstract 3D spatial understanding is highly valuable in the context of semantic modeling of
environments, agents, and their relationships. Semantic modeling approaches employed on …

Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives

T Nguyen, Y Bin, J Xiao, L Qu, Y Li, JZ Wu… - arXiv preprint arXiv …, 2024 - arxiv.org

Humans use multiple senses to comprehend the environment. Vision and language are two
of the most vital senses since they allow us to easily communicate our thoughts and …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

ZJU ReLER Submission for EPIC-KITCHEN Challenge 2023: TREK-150 Single Object Tracking

Y Xu, J Li, Z Yang, Y Yang, Y Zhuang - arXiv preprint arXiv:2307.02508, 2023 - arxiv.org

The Associating Objects with Transformers (AOT) framework has exhibited exceptional
performance in a wide range of complex scenarios for video object tracking and …

被引用次数：1 相关文章所有 2 个版本