Anticipative video transformer
Abstract We propose Anticipative Video Transformer (AVT), an end-to-end attention-based
video modeling architecture that attends to the previously observed video in order to …
video modeling architecture that attends to the previously observed video in order to …
Anticipative feature fusion transformer for multi-modal action anticipation
Z Zhong, D Schneider, M Voit… - Proceedings of the …, 2023 - openaccess.thecvf.com
Although human action anticipation is a task which is inherently multi-modal, state-of-the-art
methods on well known action anticipation datasets leverage this data by applying …
methods on well known action anticipation datasets leverage this data by applying …
Latency matters: Real-time action forecasting transformer
We present RAFTformer, a real-time action forecasting transformer for latency aware real-
world action forecasting applications. RAFTformer is a two-stage fully transformer based …
world action forecasting applications. RAFTformer is a two-stage fully transformer based …
Rethinking learning approaches for long-term action anticipation
Action anticipation involves predicting future actions having observed the initial portion of a
video. Typically, the observed video is processed as a whole to obtain a video-level …
video. Typically, the observed video is processed as a whole to obtain a video-level …
Gepsan: Generative procedure step anticipation in cooking videos
We study the problem of future step anticipation in procedural videos. Given a video of an
ongoing procedural activity, we predict a plausible next procedure step described in rich …
ongoing procedural activity, we predict a plausible next procedure step described in rich …
Interaction region visual transformer for egocentric action anticipation
D Roy, R Rajendiran… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Human-object interaction (HOI) and temporal dynamics along the motion paths are the most
important visual cues for egocentric action anticipation. Especially, interaction regions …
important visual cues for egocentric action anticipation. Especially, interaction regions …
GSC: A graph and spatio-temporal continuity based framework for accident anticipation
Accident anticipation attempts to predict whether an accident may occur in advance, which is
greatly significant for improving the safety of intelligent vehicles. Most existing approaches …
greatly significant for improving the safety of intelligent vehicles. Most existing approaches …
Learnable irrelevant modality dropout for multimodal action recognition on modality-specific annotated videos
S Alfasly, J Lu, C Xu, Y Zou - Proceedings of the IEEE/CVF …, 2022 - openaccess.thecvf.com
With the assumption that a video dataset is multimodality annotated in which auditory and
visual modalities both are labeled or class-relevant, current multimodal methods apply …
visual modalities both are labeled or class-relevant, current multimodal methods apply …
Predicting the next action by modeling the abstract goal
D Roy, B Fernando - arXiv preprint arXiv:2209.05044, 2022 - arxiv.org
The problem of anticipating human actions is an inherently uncertain one. However, we can
reduce this uncertainty if we have a sense of the goal that the actor is trying to achieve. Here …
reduce this uncertainty if we have a sense of the goal that the actor is trying to achieve. Here …
[HTML][HTML] Streaming egocentric action anticipation: An evaluation scheme and approach
A Furnari, GM Farinella - Computer Vision and Image Understanding, 2023 - Elsevier
Egocentric action anticipation aims to predict the future actions the camera wearer will
perform from the observation of the past. While predictions about the future should be …
perform from the observation of the past. While predictions about the future should be …