Slowfast networks for video recognition

C Feichtenhofer, H Fan, J Malik… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
We present SlowFast networks for video recognition. Our model involves (i) a Slow pathway,
operating at low frame rate, to capture spatial semantics, and (ii) a Fast pathway, operating …

Video action transformer network

R Girdhar, J Carreira, C Doersch… - Proceedings of the …, 2019 - openaccess.thecvf.com
Abstract We introduce the Action Transformer model for recognizing and localizing human
actions in video clips. We repurpose a Transformer-style architecture to aggregate features …

Action genome: Actions as compositions of spatio-temporal scene graphs

J Ji, R Krishna, L Fei-Fei… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com
Action recognition has typically treated actions and activities as monolithic events that occur
in videos. However, there is evidence from Cognitive Science and Neuroscience that people …

Long-term feature banks for detailed video understanding

CY Wu, C Feichtenhofer, H Fan, K He… - Proceedings of the …, 2019 - openaccess.thecvf.com
To understand the world, we humans constantly need to relate the present to the past, and
put events in context. In this paper, we enable existing video models to do the same. We …

Video action understanding

MS Hutchinson, VN Gadepally - IEEE Access, 2021 - ieeexplore.ieee.org
Many believe that the successes of deep learning on image understanding problems can be
replicated in the realm of video understanding. However, due to the scale and temporal …

Watch only once: An end-to-end video action detection framework

S Chen, P Sun, E Xie, C Ge, J Wu… - Proceedings of the …, 2021 - openaccess.thecvf.com
We propose an end-to-end pipeline, named Watch Once Only (WOO), for video action
detection. Current methods either decouple video action detection task into separated …

Stmixer: A one-stage sparse action detector

T Wu, M Cao, Z Gao, G Wu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Traditional video action detectors typically adopt the two-stage pipeline, where a person
detector is first employed to yield actor boxes and then 3D RoIAlign is used to extract actor …

Boxsnake: Polygonal instance segmentation with box supervision

R Yang, L Song, Y Ge, X Li - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Box-supervised instance segmentation has gained much attention as it requires only simple
box annotations instead of costly mask or polygon annotations. However, existing box …

A survey on deep learning-based spatio-temporal action detection

P Wang, F Zeng, Y Qian - arXiv preprint arXiv:2308.01618, 2023 - arxiv.org
Spatio-temporal action detection (STAD) aims to classify the actions present in a video and
localize them in space and time. It has become a particularly active area of research in …

Dynamic motion representation for human action recognition

S Asghari-Esfeden, M Sznaier… - Proceedings of the …, 2020 - openaccess.thecvf.com
Despite the advances in Human Activity Recognition, the ability to exploit the dynamics of
human body motion in videos has yet to be achieved. In numerous recent works …