Deep learning-based action detection in untrimmed videos: A survey

E Vahdani, Y Tian - IEEE Transactions on Pattern Analysis and …, 2022 - ieeexplore.ieee.org
Understanding human behavior and activity facilitates advancement of numerous real-world
applications, and is critical for video analysis. Despite the progress of action recognition …

Actionformer: Localizing moments of actions with transformers

CL Zhang, J Wu, Y Li - European Conference on Computer Vision, 2022 - Springer
Self-attention based Transformer models have demonstrated impressive results for image
classification and object detection, and more recently for video understanding. Inspired by …

Prompting visual-language models for efficient video understanding

C Ju, T Han, K Zheng, Y Zhang, W Xie - European Conference on …, 2022 - Springer
Image-based visual-language (I-VL) pre-training has shown great success for learning joint
visual-textual representations from large-scale web data, revealing remarkable ability for …

Tridet: Temporal action detection with relative boundary modeling

D Shi, Y Zhong, Q Cao, L Ma, J Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
In this paper, we present a one-stage framework TriDet for temporal action detection.
Existing methods often suffer from imprecise boundary predictions due to the ambiguous …

Learning salient boundary feature for anchor-free temporal action localization

C Lin, C Xu, D Luo, Y Wang, Y Tai… - Proceedings of the …, 2021 - openaccess.thecvf.com
Temporal action localization is an important yet challenging task in video understanding.
Typically, such a task aims at inferring both the action category and localization of the start …

End-to-end temporal action detection with transformer

X Liu, Q Wang, Y Hu, X Tang, S Zhang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Temporal action detection (TAD) aims to determine the semantic label and the temporal
interval of every action instance in an untrimmed video. It is a fundamental and challenging …

Cola: Weakly-supervised temporal action localization with snippet contrastive learning

C Zhang, M Cao, D Yang, J Chen… - Proceedings of the …, 2021 - openaccess.thecvf.com
Weakly-supervised temporal action localization (WS-TAL) aims to localize actions in
untrimmed videos with only video-level labels. Most existing models follow the" localization …

Dual-evidential learning for weakly-supervised temporal action localization

M Chen, J Gao, S Yang, C Xu - European conference on computer vision, 2022 - Springer
Weakly-supervised temporal action localization (WS-TAL) aims to localize the action
instances and recognize their categories with only video-level labels. Despite great …

TallFormer: Temporal Action Localization with a Long-Memory Transformer

F Cheng, G Bertasius - European Conference on Computer Vision, 2022 - Springer
Most modern approaches in temporal action localization divide this problem into two parts:(i)
short-term feature extraction and (ii) long-range temporal boundary localization. Due to the …

Enriching local and global contexts for temporal action localization

Z Zhu, W Tang, L Wang, N Zheng… - Proceedings of the …, 2021 - openaccess.thecvf.com
Effectively tackling the problem of temporal action localization (TAL) necessitates a visual
representation that jointly pursues two confounding goals, ie, fine-grained discrimination for …