Deep learning-based action detection in untrimmed videos: A survey
Understanding human behavior and activity facilitates advancement of numerous real-world
applications, and is critical for video analysis. Despite the progress of action recognition …
applications, and is critical for video analysis. Despite the progress of action recognition …
Cola: Weakly-supervised temporal action localization with snippet contrastive learning
Weakly-supervised temporal action localization (WS-TAL) aims to localize actions in
untrimmed videos with only video-level labels. Most existing models follow the" localization …
untrimmed videos with only video-level labels. Most existing models follow the" localization …
Object-region video transformers
R Herzig, E Ben-Avraham… - Proceedings of the …, 2022 - openaccess.thecvf.com
Recently, video transformers have shown great success in video understanding, exceeding
CNN performance; yet existing video transformer models do not explicitly model objects …
CNN performance; yet existing video transformer models do not explicitly model objects …
Tsp: Temporally-sensitive pretraining of video encoders for localization tasks
H Alwassel, S Giancola… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
Due to the large memory footprint of untrimmed videos, current state-of-the-art video
localization methods operate atop precomputed video clip features. These features are …
localization methods operate atop precomputed video clip features. These features are …
Learning action completeness from points for weakly-supervised temporal action localization
We tackle the problem of localizing temporal intervals of actions with only a single frame
label for each action instance for training. Owing to label sparsity, existing work fails to learn …
label for each action instance for training. Owing to label sparsity, existing work fails to learn …
Cross-modal consensus network for weakly supervised temporal action localization
Weakly supervised temporal action localization (WS-TAL) is a challenging task that aims to
localize action instances in the given video with video-level categorical supervision …
localize action instances in the given video with video-level categorical supervision …
Background-click supervision for temporal action localization
Weakly supervised temporal action localization aims at learning the instance-level action
pattern from the video-level labels, where a significant challenge is action-context confusion …
pattern from the video-level labels, where a significant challenge is action-context confusion …
Learning to refactor action and co-occurrence features for temporal action localization
The main challenge of Temporal Action Localization is to retrieve subtle human actions from
various co-occurring ingredients, eg, context and background, in an untrimmed video. While …
various co-occurring ingredients, eg, context and background, in an untrimmed video. While …
Weakly-supervised temporal action localization by uncertainty modeling
Weakly-supervised temporal action localization aims to learn detecting temporal intervals of
action classes with only video-level labels. To this end, it is crucial to separate frames of …
action classes with only video-level labels. To this end, it is crucial to separate frames of …
Activity graph transformer for temporal action localization
We introduce Activity Graph Transformer, an end-to-end learnable model for temporal action
localization, that receives a video as input and directly predicts a set of action instances that …
localization, that receives a video as input and directly predicts a set of action instances that …