Deep learning-based action detection in untrimmed videos: A survey
Understanding human behavior and activity facilitates advancement of numerous real-world
applications, and is critical for video analysis. Despite the progress of action recognition …
applications, and is critical for video analysis. Despite the progress of action recognition …
Actionformer: Localizing moments of actions with transformers
Self-attention based Transformer models have demonstrated impressive results for image
classification and object detection, and more recently for video understanding. Inspired by …
classification and object detection, and more recently for video understanding. Inspired by …
Prompting visual-language models for efficient video understanding
Image-based visual-language (I-VL) pre-training has shown great success for learning joint
visual-textual representations from large-scale web data, revealing remarkable ability for …
visual-textual representations from large-scale web data, revealing remarkable ability for …
Tridet: Temporal action detection with relative boundary modeling
In this paper, we present a one-stage framework TriDet for temporal action detection.
Existing methods often suffer from imprecise boundary predictions due to the ambiguous …
Existing methods often suffer from imprecise boundary predictions due to the ambiguous …
Learning salient boundary feature for anchor-free temporal action localization
Temporal action localization is an important yet challenging task in video understanding.
Typically, such a task aims at inferring both the action category and localization of the start …
Typically, such a task aims at inferring both the action category and localization of the start …
End-to-end temporal action detection with transformer
Temporal action detection (TAD) aims to determine the semantic label and the temporal
interval of every action instance in an untrimmed video. It is a fundamental and challenging …
interval of every action instance in an untrimmed video. It is a fundamental and challenging …
Cola: Weakly-supervised temporal action localization with snippet contrastive learning
Weakly-supervised temporal action localization (WS-TAL) aims to localize actions in
untrimmed videos with only video-level labels. Most existing models follow the" localization …
untrimmed videos with only video-level labels. Most existing models follow the" localization …
Dual-evidential learning for weakly-supervised temporal action localization
Weakly-supervised temporal action localization (WS-TAL) aims to localize the action
instances and recognize their categories with only video-level labels. Despite great …
instances and recognize their categories with only video-level labels. Despite great …
TallFormer: Temporal Action Localization with a Long-Memory Transformer
F Cheng, G Bertasius - European Conference on Computer Vision, 2022 - Springer
Most modern approaches in temporal action localization divide this problem into two parts:(i)
short-term feature extraction and (ii) long-range temporal boundary localization. Due to the …
short-term feature extraction and (ii) long-range temporal boundary localization. Due to the …
Enriching local and global contexts for temporal action localization
Effectively tackling the problem of temporal action localization (TAL) necessitates a visual
representation that jointly pursues two confounding goals, ie, fine-grained discrimination for …
representation that jointly pursues two confounding goals, ie, fine-grained discrimination for …