Human action recognition from various data modalities: A review
Human Action Recognition (HAR) aims to understand human behavior and assign a label to
each action. It has a wide range of applications, and therefore has been attracting increasing …
each action. It has a wide range of applications, and therefore has been attracting increasing …
A review of video surveillance systems
Automated surveillance systems observe the environment utilizing cameras. The observed
scenario is then analysed using motion detection, crowd behaviour, individual behaviour …
scenario is then analysed using motion detection, crowd behaviour, individual behaviour …
Multiview transformers for video recognition
Video understanding requires reasoning at multiple spatiotemporal resolutions--from short
fine-grained motions to events taking place over longer durations. Although transformer …
fine-grained motions to events taking place over longer durations. Although transformer …
Vivit: A video vision transformer
We present pure-transformer based models for video classification, drawing upon the recent
success of such models in image classification. Our model extracts spatio-temporal tokens …
success of such models in image classification. Our model extracts spatio-temporal tokens …
X3d: Expanding architectures for efficient video recognition
C Feichtenhofer - Proceedings of the IEEE/CVF conference …, 2020 - openaccess.thecvf.com
This paper presents X3D, a family of efficient video networks that progressively expand a
tiny 2D image classification architecture along multiple network axes, in space, time, width …
tiny 2D image classification architecture along multiple network axes, in space, time, width …
Tea: Temporal excitation and aggregation for action recognition
Temporal modeling is key for action recognition in videos. It normally considers both short-
range motions and long-range aggregations. In this paper, we propose a Temporal …
range motions and long-range aggregations. In this paper, we propose a Temporal …
A survey of the recent architectures of deep convolutional neural networks
Abstract Deep Convolutional Neural Network (CNN) is a special type of Neural Networks,
which has shown exemplary performance on several competitions related to Computer …
which has shown exemplary performance on several competitions related to Computer …
Tsm: Temporal shift module for efficient video understanding
The explosive growth in video streaming gives rise to challenges on performing video
understanding at high accuracy and low computation cost. Conventional 2D CNNs are …
understanding at high accuracy and low computation cost. Conventional 2D CNNs are …
Stm: Spatiotemporal and motion encoding for action recognition
Spatiotemporal and motion features are two complementary and crucial information for
video action recognition. Recent state-of-the-art methods adopt a 3D CNN stream to learn …
video action recognition. Recent state-of-the-art methods adopt a 3D CNN stream to learn …
Self-supervised spatiotemporal learning via video clip order prediction
We propose a self-supervised spatiotemporal learning technique which leverages the
chronological order of videos. Our method can learn the spatiotemporal representation of …
chronological order of videos. Our method can learn the spatiotemporal representation of …