Human action recognition from various data modalities: A review

Z Sun, Q Ke, H Rahmani, M Bennamoun… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Human Action Recognition (HAR) aims to understand human behavior and assign a label to
each action. It has a wide range of applications, and therefore has been attracting increasing …

A review of video surveillance systems

O Elharrouss, N Almaadeed, S Al-Maadeed - Journal of Visual …, 2021 - Elsevier
Automated surveillance systems observe the environment utilizing cameras. The observed
scenario is then analysed using motion detection, crowd behaviour, individual behaviour …

Multiview transformers for video recognition

S Yan, X Xiong, A Arnab, Z Lu… - Proceedings of the …, 2022 - openaccess.thecvf.com
Video understanding requires reasoning at multiple spatiotemporal resolutions--from short
fine-grained motions to events taking place over longer durations. Although transformer …

Vivit: A video vision transformer

A Arnab, M Dehghani, G Heigold… - Proceedings of the …, 2021 - openaccess.thecvf.com
We present pure-transformer based models for video classification, drawing upon the recent
success of such models in image classification. Our model extracts spatio-temporal tokens …

X3d: Expanding architectures for efficient video recognition

C Feichtenhofer - Proceedings of the IEEE/CVF conference …, 2020 - openaccess.thecvf.com
This paper presents X3D, a family of efficient video networks that progressively expand a
tiny 2D image classification architecture along multiple network axes, in space, time, width …

Tea: Temporal excitation and aggregation for action recognition

Y Li, B Ji, X Shi, J Zhang, B Kang… - Proceedings of the …, 2020 - openaccess.thecvf.com
Temporal modeling is key for action recognition in videos. It normally considers both short-
range motions and long-range aggregations. In this paper, we propose a Temporal …

A survey of the recent architectures of deep convolutional neural networks

A Khan, A Sohail, U Zahoora, AS Qureshi - Artificial intelligence review, 2020 - Springer
Abstract Deep Convolutional Neural Network (CNN) is a special type of Neural Networks,
which has shown exemplary performance on several competitions related to Computer …

Tsm: Temporal shift module for efficient video understanding

J Lin, C Gan, S Han - Proceedings of the IEEE/CVF …, 2019 - openaccess.thecvf.com
The explosive growth in video streaming gives rise to challenges on performing video
understanding at high accuracy and low computation cost. Conventional 2D CNNs are …

Stm: Spatiotemporal and motion encoding for action recognition

B Jiang, MM Wang, W Gan, W Wu… - Proceedings of the …, 2019 - openaccess.thecvf.com
Spatiotemporal and motion features are two complementary and crucial information for
video action recognition. Recent state-of-the-art methods adopt a 3D CNN stream to learn …

Self-supervised spatiotemporal learning via video clip order prediction

D Xu, J Xiao, Z Zhao, J Shao, D Xie… - Proceedings of the …, 2019 - openaccess.thecvf.com
We propose a self-supervised spatiotemporal learning technique which leverages the
chronological order of videos. Our method can learn the spatiotemporal representation of …