Ego4d goal-step: Toward hierarchical understanding of procedural activities

K Grauman, A Westbury, L Torresani… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract We present Ego-Exo4D a diverse large-scale multimodal multiview video dataset
and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric …

被引用次数：70 相关文章所有 5 个版本

[PDF] thecvf.com

Video ReCap: Recursive Captioning of Hour-Long Videos

MM Islam, N Ho, X Yang, T Nagarajan… - Proceedings of the …, 2024 - openaccess.thecvf.com

Most video captioning models are designed to process short video clips of few seconds and
output text describing low-level visual concepts (eg objects scenes atomic actions). However …

被引用次数：9 相关文章所有 2 个版本

[PDF] thecvf.com

Learning to Segment Referred Objects from Narrated Egocentric Videos

Y Shen, H Wang, X Yang, M Feiszli… - Proceedings of the …, 2024 - openaccess.thecvf.com

Egocentric videos provide a first-person perspective of the wearer's activities involving
simultaneous interactions with multiple objects. In this work we propose the task of weakly …

[PDF] thecvf.com

EgoExoLearn: A Dataset for Bridging Asynchronous Ego-and Exo-centric View of Procedural Activities in Real World

Y Huang, G Chen, J Xu, M Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Being able to map the activities of others into one's own point of view is one fundamental
human skill even from a very early age. Taking a step toward understanding this human …

被引用次数：10 相关文章所有 4 个版本

[PDF] thecvf.com

VideoLLM-online: Online Video Large Language Model for Streaming Video

J Chen, Z Lv, S Wu, KQ Lin, C Song… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Large Language Models (LLMs) have been enhanced with vision capabilities
enabling them to comprehend images videos and interleaved vision-language content …

被引用次数：6 相关文章所有 2 个版本

[PDF] thecvf.com

Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos

KRY Nagasinghe, H Zhou… - Proceedings of the …, 2024 - openaccess.thecvf.com

In this paper we explore the capability of an agent to construct a logical sequence of action
steps thereby assembling a strategic procedural plan. This plan is crucial for navigating from …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Egovideo: Exploring egocentric foundation model and downstream adaptation

B Pei, G Chen, J Xu, Y He, Y Liu, K Pan… - arXiv preprint arXiv …, 2024 - arxiv.org

In this report, we present our solutions to the EgoVis Challenges in CVPR 2024, including
five tracks in the Ego4D challenge and three tracks in the EPIC-Kitchens challenge. Building …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org