A review of multimodal human activity recognition with special emphasis on classification, applications, challenges and future directions

SK Yadav, K Tiwari, HM Pandey, SA Akbar - Knowledge-Based Systems, 2021 - Elsevier
Human activity recognition (HAR) is one of the most important and challenging problems in
the computer vision. It has critical application in wide variety of tasks including gaming …

[HTML][HTML] A comprehensive survey of vision-based human action recognition methods

HB Zhang, YX Zhang, B Zhong, Q Lei, L Yang, JX Du… - Sensors, 2019 - mdpi.com
Although widely used in many applications, accurate and efficient human action recognition
remains a challenging area of research in the field of computer vision. Most recent surveys …

A comprehensive study of deep video action recognition

Y Zhu, X Li, C Liu, M Zolfaghari, Y Xiong, C Wu… - arXiv preprint arXiv …, 2020 - arxiv.org
Video action recognition is one of the representative tasks for video understanding. Over the
last decade, we have witnessed great advancements in video action recognition thanks to …

Videos as space-time region graphs

X Wang, A Gupta - Proceedings of the European …, 2018 - openaccess.thecvf.com
How do humans recognize the action" opening a book"? We argue that there are two
important cues: modeling temporal shape dynamics and modeling functional relationships …

Temporal action localization in untrimmed videos via multi-stage cnns

Z Shou, D Wang, SF Chang - Proceedings of the IEEE …, 2016 - openaccess.thecvf.com
We address temporal action localization in untrimmed long videos. This is important
because videos in real applications are usually unconstrained and contain multiple action …

Long-term temporal convolutions for action recognition

G Varol, I Laptev, C Schmid - IEEE transactions on pattern …, 2017 - ieeexplore.ieee.org
Typical human actions last several seconds and exhibit characteristic spatio-temporal
structure. Recent methods attempt to capture this structure and learn action representations …

Unsupervised learning of video representations using lstms

N Srivastava, E Mansimov… - … on machine learning, 2015 - proceedings.mlr.press
Abstract We use Long Short Term Memory (LSTM) networks to learn representations of
video sequences. Our model uses an encoder LSTM to map an input sequence into a fixed …

Going deeper into action recognition: A survey

S Herath, M Harandi, F Porikli - Image and vision computing, 2017 - Elsevier
Understanding human actions in visual data is tied to advances in complementary research
areas including object recognition, human dynamics, domain adaptation and semantic …

Learning spatiotemporal features with 3d convolutional networks

D Tran, L Bourdev, R Fergus… - Proceedings of the …, 2015 - openaccess.thecvf.com
We propose a simple, yet effective approach for spatiotemporal feature learning using deep
3-dimensional convolutional networks (3D ConvNets) trained on a large scale supervised …

Delving deeper into convolutional networks for learning video representations

N Ballas, L Yao, C Pal, A Courville - arXiv preprint arXiv:1511.06432, 2015 - arxiv.org
We propose an approach to learn spatio-temporal features in videos from intermediate
visual representations we call" percepts" using Gated-Recurrent-Unit Recurrent Networks …