Bottom-up temporal action localization with mutual regularization

E Vahdani, Y Tian - IEEE Transactions on Pattern Analysis and …, 2022 - ieeexplore.ieee.org

Understanding human behavior and activity facilitates advancement of numerous real-world
applications, and is critical for video analysis. Despite the progress of action recognition …

被引用次数：55 相关文章所有 8 个版本

[PDF] arxiv.org

Actionformer: Localizing moments of actions with transformers

CL Zhang, J Wu, Y Li - European Conference on Computer Vision, 2022 - Springer

Self-attention based Transformer models have demonstrated impressive results for image
classification and object detection, and more recently for video understanding. Inspired by …

被引用次数：292 相关文章所有 7 个版本

[PDF] arxiv.org

Prompting visual-language models for efficient video understanding

C Ju, T Han, K Zheng, Y Zhang, W Xie - European Conference on …, 2022 - Springer

Image-based visual-language (I-VL) pre-training has shown great success for learning joint
visual-textual representations from large-scale web data, revealing remarkable ability for …

被引用次数：294 相关文章所有 6 个版本

[PDF] thecvf.com

Tridet: Temporal action detection with relative boundary modeling

D Shi, Y Zhong, Q Cao, L Ma, J Li… - Proceedings of the …, 2023 - openaccess.thecvf.com

In this paper, we present a one-stage framework TriDet for temporal action detection.
Existing methods often suffer from imprecise boundary predictions due to the ambiguous …

被引用次数：81 相关文章所有 5 个版本

[PDF] thecvf.com

Learning salient boundary feature for anchor-free temporal action localization

C Lin, C Xu, D Luo, Y Wang, Y Tai… - Proceedings of the …, 2021 - openaccess.thecvf.com

Temporal action localization is an important yet challenging task in video understanding.
Typically, such a task aims at inferring both the action category and localization of the start …

被引用次数：260 相关文章所有 5 个版本

[PDF] arxiv.org

End-to-end temporal action detection with transformer

X Liu, Q Wang, Y Hu, X Tang, S Zhang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Temporal action detection (TAD) aims to determine the semantic label and the temporal
interval of every action instance in an untrimmed video. It is a fundamental and challenging …

被引用次数：189 相关文章所有 5 个版本

[PDF] thecvf.com

Cola: Weakly-supervised temporal action localization with snippet contrastive learning

C Zhang, M Cao, D Yang, J Chen… - Proceedings of the …, 2021 - openaccess.thecvf.com

Weakly-supervised temporal action localization (WS-TAL) aims to localize actions in
untrimmed videos with only video-level labels. Most existing models follow the" localization …

被引用次数：141 相关文章所有 7 个版本

[PDF] ecva.net

Dual-evidential learning for weakly-supervised temporal action localization

M Chen, J Gao, S Yang, C Xu - European conference on computer vision, 2022 - Springer

Weakly-supervised temporal action localization (WS-TAL) aims to localize the action
instances and recognize their categories with only video-level labels. Despite great …

被引用次数：50 相关文章所有 5 个版本

[PDF] arxiv.org

TallFormer: Temporal Action Localization with a Long-Memory Transformer

F Cheng, G Bertasius - European Conference on Computer Vision, 2022 - Springer

Most modern approaches in temporal action localization divide this problem into two parts:(i)
short-term feature extraction and (ii) long-range temporal boundary localization. Due to the …

被引用次数：82 相关文章所有 5 个版本

[PDF] thecvf.com

Enriching local and global contexts for temporal action localization

Z Zhu, W Tang, L Wang, N Zheng… - Proceedings of the …, 2021 - openaccess.thecvf.com

Effectively tackling the problem of temporal action localization (TAL) necessitates a visual
representation that jointly pursues two confounding goals, ie, fine-grained discrimination for …

被引用次数：113 相关文章所有 6 个版本