Ego-topo: Environment affordances from egocentric video
Proceedings of the IEEE/CVF Conference on Computer Vision and …, 2020•openaccess.thecvf.com
First-person video naturally brings the use of a physical environment to the forefront, since it
shows the camera wearer interacting fluidly in a space based on his intentions. However,
current methods largely separate the observed actions from the persistent space itself. We
introduce a model for environment affordances that is learned directly from egocentric video.
The main idea is to gain a human-centric model of a physical space (such as a kitchen) that
captures (1) the primary spatial zones of interaction and (2) the likely activities they support …
shows the camera wearer interacting fluidly in a space based on his intentions. However,
current methods largely separate the observed actions from the persistent space itself. We
introduce a model for environment affordances that is learned directly from egocentric video.
The main idea is to gain a human-centric model of a physical space (such as a kitchen) that
captures (1) the primary spatial zones of interaction and (2) the likely activities they support …
Abstract
First-person video naturally brings the use of a physical environment to the forefront, since it shows the camera wearer interacting fluidly in a space based on his intentions. However, current methods largely separate the observed actions from the persistent space itself. We introduce a model for environment affordances that is learned directly from egocentric video. The main idea is to gain a human-centric model of a physical space (such as a kitchen) that captures (1) the primary spatial zones of interaction and (2) the likely activities they support. Our approach decomposes a space into a topological map derived from first-person activity, organizing an ego-video into a series of visits to the different zones. Further, we show how to link zones across multiple related environments (eg, from videos of multiple kitchens) to obtain a consolidated representation of environment functionality. On EPIC-Kitchens and EGTEA+, we demonstrate our approach for learning scene affordances and anticipating future actions in long-form video.
openaccess.thecvf.com
以上显示的是最相近的搜索结果。 查看全部搜索结果