Revisiting anchor mechanisms for temporal action localization

M Liu, L Nie, Y Wang, M Wang, Y Rui - ACM Computing Surveys, 2023 - dl.acm.org

Video moment localization, also known as video moment retrieval, aims to search a target
segment within a video described by a given natural language query. Beyond the task of …

被引用次数：30 相关文章所有 4 个版本

[PDF] arxiv.org

Deep learning-based action detection in untrimmed videos: A survey

E Vahdani, Y Tian - IEEE Transactions on Pattern Analysis and …, 2022 - ieeexplore.ieee.org

Understanding human behavior and activity facilitates advancement of numerous real-world
applications, and is critical for video analysis. Despite the progress of action recognition …

被引用次数：64 相关文章所有 8 个版本

[PDF] arxiv.org

Actionformer: Localizing moments of actions with transformers

CL Zhang, J Wu, Y Li - European Conference on Computer Vision, 2022 - Springer

Self-attention based Transformer models have demonstrated impressive results for image
classification and object detection, and more recently for video understanding. Inspired by …

被引用次数：375 相关文章所有 7 个版本

[PDF] thecvf.com

Tridet: Temporal action detection with relative boundary modeling

D Shi, Y Zhong, Q Cao, L Ma, J Li… - Proceedings of the …, 2023 - openaccess.thecvf.com

In this paper, we present a one-stage framework TriDet for temporal action detection.
Existing methods often suffer from imprecise boundary predictions due to the ambiguous …

被引用次数：128 相关文章所有 5 个版本

[PDF] arxiv.org

Prompting visual-language models for efficient video understanding

C Ju, T Han, K Zheng, Y Zhang, W Xie - European Conference on …, 2022 - Springer

Image-based visual-language (I-VL) pre-training has shown great success for learning joint
visual-textual representations from large-scale web data, revealing remarkable ability for …

被引用次数：366 相关文章所有 6 个版本

[PDF] thecvf.com

Learning salient boundary feature for anchor-free temporal action localization

C Lin, C Xu, D Luo, Y Wang, Y Tai… - Proceedings of the …, 2021 - openaccess.thecvf.com

Temporal action localization is an important yet challenging task in video understanding.
Typically, such a task aims at inferring both the action category and localization of the start …

被引用次数：295 相关文章所有 5 个版本

[PDF] arxiv.org

End-to-end temporal action detection with transformer

X Liu, Q Wang, Y Hu, X Tang, S Zhang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Temporal action detection (TAD) aims to determine the semantic label and the temporal
interval of every action instance in an untrimmed video. It is a fundamental and challenging …

被引用次数：240 相关文章所有 5 个版本

[PDF] thecvf.com

Unloc: A unified framework for video localization tasks

S Yan, X Xiong, A Nagrani, A Arnab… - Proceedings of the …, 2023 - openaccess.thecvf.com

While large-scale image-text pretrained models such as CLIP have been used for multiple
video-level tasks on trimmed videos, their use for temporal localization in untrimmed videos …

被引用次数：35 相关文章所有 6 个版本

[PDF] thecvf.com

The 7th ai city challenge

M Naphade, S Wang, DC Anastasiu… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract The AI City Challenge's seventh edition emphasizes two domains at the intersection
of computer vision and artificial intelligence-retail business and Intelligent Traffic Systems …

被引用次数：263 相关文章所有 26 个版本

[PDF] liplus.me

Dsnet: A flexible detect-to-summarize network for video summarization

W Zhu, J Lu, J Li, J Zhou - IEEE Transactions on Image …, 2020 - ieeexplore.ieee.org

In this paper, we propose a Detect-to-Summarize network (DSNet) framework for supervised
video summarization. Our DSNet contains anchor-based and anchor-free counterparts. The …

被引用次数：155 相关文章所有 7 个版本