Beyond gaussian pyramid: Multi-skip feature stacking for action recognition

A review of multimodal human activity recognition with special emphasis on classification, applications, challenges and future directions

SK Yadav, K Tiwari, HM Pandey, SA Akbar - Knowledge-Based Systems, 2021 - Elsevier

Human activity recognition (HAR) is one of the most important and challenging problems in
the computer vision. It has critical application in wide variety of tasks including gaming …

被引用次数：217 相关文章所有 3 个版本

[HTML] mdpi.com

[HTML][HTML] A comprehensive survey of vision-based human action recognition methods

HB Zhang, YX Zhang, B Zhong, Q Lei, L Yang, JX Du… - Sensors, 2019 - mdpi.com

Although widely used in many applications, accurate and efficient human action recognition
remains a challenging area of research in the field of computer vision. Most recent surveys …

被引用次数：544 相关文章所有 9 个版本

[PDF] arxiv.org

A comprehensive study of deep video action recognition

Y Zhu, X Li, C Liu, M Zolfaghari, Y Xiong, C Wu… - arXiv preprint arXiv …, 2020 - arxiv.org

Video action recognition is one of the representative tasks for video understanding. Over the
last decade, we have witnessed great advancements in video action recognition thanks to …

被引用次数：210 相关文章所有 2 个版本

[PDF] thecvf.com

Videos as space-time region graphs

X Wang, A Gupta - Proceedings of the European …, 2018 - openaccess.thecvf.com

How do humans recognize the action" opening a book"? We argue that there are two
important cues: modeling temporal shape dynamics and modeling functional relationships …

被引用次数：877 相关文章所有 10 个版本

[PDF] thecvf.com

Temporal action localization in untrimmed videos via multi-stage cnns

Z Shou, D Wang, SF Chang - Proceedings of the IEEE …, 2016 - openaccess.thecvf.com

We address temporal action localization in untrimmed long videos. This is important
because videos in real applications are usually unconstrained and contain multiple action …

被引用次数：1129 相关文章所有 12 个版本

[PDF] arxiv.org

Long-term temporal convolutions for action recognition

G Varol, I Laptev, C Schmid - IEEE transactions on pattern …, 2017 - ieeexplore.ieee.org

Typical human actions last several seconds and exhibit characteristic spatio-temporal
structure. Recent methods attempt to capture this structure and learn action representations …

被引用次数：1153 相关文章所有 14 个版本

[PDF] mlr.press

Unsupervised learning of video representations using lstms

N Srivastava, E Mansimov… - … on machine learning, 2015 - proceedings.mlr.press

Abstract We use Long Short Term Memory (LSTM) networks to learn representations of
video sequences. Our model uses an encoder LSTM to map an input sequence into a fixed …

被引用次数：3275 相关文章所有 19 个版本

[PDF] arxiv.org

Going deeper into action recognition: A survey

S Herath, M Harandi, F Porikli - Image and vision computing, 2017 - Elsevier

Understanding human actions in visual data is tied to advances in complementary research
areas including object recognition, human dynamics, domain adaptation and semantic …

被引用次数：795 相关文章所有 8 个版本

[PDF] thecvf.com

Learning spatiotemporal features with 3d convolutional networks

D Tran, L Bourdev, R Fergus… - Proceedings of the …, 2015 - openaccess.thecvf.com

We propose a simple, yet effective approach for spatiotemporal feature learning using deep
3-dimensional convolutional networks (3D ConvNets) trained on a large scale supervised …

被引用次数：10400 相关文章所有 13 个版本

[PDF] arxiv.org

Delving deeper into convolutional networks for learning video representations

N Ballas, L Yao, C Pal, A Courville - arXiv preprint arXiv:1511.06432, 2015 - arxiv.org

We propose an approach to learn spatio-temporal features in videos from intermediate
visual representations we call" percepts" using Gated-Recurrent-Unit Recurrent Networks …

被引用次数：866 相关文章所有 5 个版本