Deep learning-based action detection in untrimmed videos: A survey
Understanding human behavior and activity facilitates advancement of numerous real-world
applications, and is critical for video analysis. Despite the progress of action recognition …
applications, and is critical for video analysis. Despite the progress of action recognition …
Vid2seq: Large-scale pretraining of a visual language model for dense video captioning
In this work, we introduce Vid2Seq, a multi-modal single-stage dense event captioning
model pretrained on narrated videos which are readily-available at scale. The Vid2Seq …
model pretrained on narrated videos which are readily-available at scale. The Vid2Seq …
TN-ZSTAD: Transferable network for zero-shot temporal activity detection
An integral part of video analysis and surveillance is temporal activity detection, which
means to simultaneously recognize and localize activities in long untrimmed videos …
means to simultaneously recognize and localize activities in long untrimmed videos …
Learning salient boundary feature for anchor-free temporal action localization
Temporal action localization is an important yet challenging task in video understanding.
Typically, such a task aims at inferring both the action category and localization of the start …
Typically, such a task aims at inferring both the action category and localization of the start …
End-to-end dense video captioning with parallel decoding
Dense video captioning aims to generate multiple associated captions with their temporal
locations from the video. Previous methods follow a sophisticated" localize-then-describe" …
locations from the video. Previous methods follow a sophisticated" localize-then-describe" …
End-to-end temporal action detection with transformer
Temporal action detection (TAD) aims to determine the semantic label and the temporal
interval of every action instance in an untrimmed video. It is a fundamental and challenging …
interval of every action instance in an untrimmed video. It is a fundamental and challenging …
Bmn: Boundary-matching network for temporal action proposal generation
Temporal action proposal generation is an challenging and promising task which aims to
locate temporal regions in real-world videos where action or event may occur. Current …
locate temporal regions in real-world videos where action or event may occur. Current …
G-tad: Sub-graph localization for temporal action detection
Temporal action detection is a fundamental yet challenging task in video understanding.
Video context is a critical cue to effectively detect actions, but current works mainly focus on …
Video context is a critical cue to effectively detect actions, but current works mainly focus on …
Graph convolutional networks for temporal action localization
Most state-of-the-art action localization systems process each action proposal individually,
without explicitly exploiting their relations during learning. However, the relations between …
without explicitly exploiting their relations during learning. However, the relations between …
Relaxed transformer decoders for direct action proposal generation
Temporal action proposal generation is an important and challenging task in video
understanding, which aims at detecting all temporal segments containing action instances of …
understanding, which aims at detecting all temporal segments containing action instances of …