Multi-modal temporal convolutional network for anticipating actions in egocentric videos

R Girdhar, K Grauman - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com

Abstract We propose Anticipative Video Transformer (AVT), an end-to-end attention-based
video modeling architecture that attends to the previously observed video in order to …

被引用次数：228 相关文章所有 6 个版本

[PDF] thecvf.com

Anticipative feature fusion transformer for multi-modal action anticipation

Z Zhong, D Schneider, M Voit… - Proceedings of the …, 2023 - openaccess.thecvf.com

Although human action anticipation is a task which is inherently multi-modal, state-of-the-art
methods on well known action anticipation datasets leverage this data by applying …

被引用次数：43 相关文章所有 7 个版本

[PDF] thecvf.com

Latency matters: Real-time action forecasting transformer

H Girase, N Agarwal, C Choi… - Proceedings of the …, 2023 - openaccess.thecvf.com

We present RAFTformer, a real-time action forecasting transformer for latency aware real-
world action forecasting applications. RAFTformer is a two-stage fully transformer based …

被引用次数：14 相关文章所有 3 个版本

[PDF] arxiv.org

Rethinking learning approaches for long-term action anticipation

M Nawhal, AA Jyothi, G Mori - European Conference on Computer Vision, 2022 - Springer

Action anticipation involves predicting future actions having observed the initial portion of a
video. Typically, the observed video is processed as a whole to obtain a video-level …

被引用次数：21 相关文章所有 7 个版本

[PDF] thecvf.com

Gepsan: Generative procedure step anticipation in cooking videos

MA Abdelsalam, SB Rangrej, I Hadji… - Proceedings of the …, 2023 - openaccess.thecvf.com

We study the problem of future step anticipation in procedural videos. Given a video of an
ongoing procedural activity, we predict a plausible next procedure step described in rich …

被引用次数：5 相关文章所有 4 个版本

[PDF] thecvf.com

Interaction region visual transformer for egocentric action anticipation

D Roy, R Rajendiran… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Human-object interaction (HOI) and temporal dynamics along the motion paths are the most
important visual cues for egocentric action anticipation. Especially, interaction regions …

被引用次数：8 相关文章所有 3 个版本

GSC: A graph and spatio-temporal continuity based framework for accident anticipation

T Wang, K Chen, G Chen, B Li, Z Li… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Accident anticipation attempts to predict whether an accident may occur in advance, which is
greatly significant for improving the safety of intelligent vehicles. Most existing approaches …

被引用次数：17 相关文章

[PDF] thecvf.com

Learnable irrelevant modality dropout for multimodal action recognition on modality-specific annotated videos

S Alfasly, J Lu, C Xu, Y Zou - Proceedings of the IEEE/CVF …, 2022 - openaccess.thecvf.com

With the assumption that a video dataset is multimodality annotated in which auditory and
visual modalities both are labeled or class-relevant, current multimodal methods apply …

被引用次数：23 相关文章所有 5 个版本

[PDF] arxiv.org

Predicting the next action by modeling the abstract goal

D Roy, B Fernando - arXiv preprint arXiv:2209.05044, 2022 - arxiv.org

The problem of anticipating human actions is an inherently uncertain one. However, we can
reduce this uncertainty if we have a sense of the goal that the actor is trying to achieve. Here …

被引用次数：18 相关文章所有 3 个版本

[HTML] sciencedirect.com

[HTML][HTML] Streaming egocentric action anticipation: An evaluation scheme and approach

A Furnari, GM Farinella - Computer Vision and Image Understanding, 2023 - Elsevier

Egocentric action anticipation aims to predict the future actions the camera wearer will
perform from the observation of the past. While predictions about the future should be …

被引用次数：3 相关文章所有 4 个版本