Human centric spatio-temporal action localization

C Feichtenhofer, H Fan, J Malik… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com

We present SlowFast networks for video recognition. Our model involves (i) a Slow pathway,
operating at low frame rate, to capture spatial semantics, and (ii) a Fast pathway, operating …

被引用次数：3755 相关文章所有 11 个版本

[PDF] thecvf.com

Video action transformer network

R Girdhar, J Carreira, C Doersch… - Proceedings of the …, 2019 - openaccess.thecvf.com

Abstract We introduce the Action Transformer model for recognizing and localizing human
actions in video clips. We repurpose a Transformer-style architecture to aggregate features …

被引用次数：866 相关文章所有 11 个版本

[PDF] thecvf.com

Action genome: Actions as compositions of spatio-temporal scene graphs

J Ji, R Krishna, L Fei-Fei… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com

Action recognition has typically treated actions and activities as monolithic events that occur
in videos. However, there is evidence from Cognitive Science and Neuroscience that people …

被引用次数：365 相关文章所有 9 个版本

[PDF] thecvf.com

Long-term feature banks for detailed video understanding

CY Wu, C Feichtenhofer, H Fan, K He… - Proceedings of the …, 2019 - openaccess.thecvf.com

To understand the world, we humans constantly need to relate the present to the past, and
put events in context. In this paper, we enable existing video models to do the same. We …

被引用次数：579 相关文章所有 10 个版本

[PDF] ieee.org

Video action understanding

MS Hutchinson, VN Gadepally - IEEE Access, 2021 - ieeexplore.ieee.org

Many believe that the successes of deep learning on image understanding problems can be
replicated in the realm of video understanding. However, due to the scale and temporal …

被引用次数：36 相关文章所有 7 个版本

[PDF] thecvf.com

Watch only once: An end-to-end video action detection framework

S Chen, P Sun, E Xie, C Ge, J Wu… - Proceedings of the …, 2021 - openaccess.thecvf.com

We propose an end-to-end pipeline, named Watch Once Only (WOO), for video action
detection. Current methods either decouple video action detection task into separated …

被引用次数：64 相关文章所有 4 个版本

[PDF] thecvf.com

Stmixer: A one-stage sparse action detector

T Wu, M Cao, Z Gao, G Wu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Traditional video action detectors typically adopt the two-stage pipeline, where a person
detector is first employed to yield actor boxes and then 3D RoIAlign is used to extract actor …

被引用次数：21 相关文章所有 11 个版本

[PDF] thecvf.com

Boxsnake: Polygonal instance segmentation with box supervision

R Yang, L Song, Y Ge, X Li - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

Box-supervised instance segmentation has gained much attention as it requires only simple
box annotations instead of costly mask or polygon annotations. However, existing box …

被引用次数：20 相关文章所有 7 个版本

[PDF] arxiv.org

A survey on deep learning-based spatio-temporal action detection

P Wang, F Zeng, Y Qian - arXiv preprint arXiv:2308.01618, 2023 - arxiv.org

Spatio-temporal action detection (STAD) aims to classify the actions present in a video and
localize them in space and time. It has become a particularly active area of research in …

被引用次数：3 相关文章所有 3 个版本

[PDF] thecvf.com

Dynamic motion representation for human action recognition

S Asghari-Esfeden, M Sznaier… - Proceedings of the …, 2020 - openaccess.thecvf.com

Despite the advances in Human Activity Recognition, the ability to exploit the dynamics of
human body motion in videos has yet to be achieved. In numerous recent works …

被引用次数：57 相关文章所有 4 个版本