RGB-D data-based action recognition: a review

MB Shaikh, D Chai - Sensors, 2021 - mdpi.com
Classification of human actions is an ongoing research problem in computer vision. This
review is aimed to scope current literature on data fusion and action recognition techniques …

Sequential modeling enables scalable learning for large vision models

Y Bai, X Geng, K Mangalam, A Bar… - Proceedings of the …, 2024 - openaccess.thecvf.com
We introduce a novel sequential modeling approach which enables learning a Large Vision
Model (LVM) without making use of any linguistic data. To do this we define a common …

Review of dynamic gesture recognition

SHI Yuanyuan, LI Yunan, FU Xiaolong, M Kaibin… - Virtual Reality & …, 2021 - Elsevier
In recent years, gesture recognition has been widely used in the fields of intelligent driving,
virtual reality, and human-computer interaction. With the development of artificial …

Recurring the transformer for video action recognition

J Yang, X Dong, L Liu, C Zhang… - Proceedings of the …, 2022 - openaccess.thecvf.com
Existing video understanding approaches, such as 3D convolutional neural networks and
Transformer-Based methods, usually process the videos in a clip-wise manner. Hence huge …

Tsm: Temporal shift module for efficient video understanding

J Lin, C Gan, S Han - Proceedings of the IEEE/CVF …, 2019 - openaccess.thecvf.com
The explosive growth in video streaming gives rise to challenges on performing video
understanding at high accuracy and low computation cost. Conventional 2D CNNs are …

Action-net: Multipath excitation for action recognition

Z Wang, Q She, A Smolic - … of the IEEE/CVF conference on …, 2021 - openaccess.thecvf.com
Abstract Spatial-temporal, channel-wise, and motion patterns are three complementary and
crucial types of information for video action recognition. Conventional 2D CNNs are …

BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues

S Albanie, G Varol, L Momeni, T Afouras… - Computer Vision–ECCV …, 2020 - Springer
Recent progress in fine-grained gesture and action classification, and machine translation,
point to the possibility of automated sign language recognition becoming a reality. A key …

Direcformer: A directed attention in transformer approach to robust action recognition

TD Truong, QH Bui, CN Duong… - Proceedings of the …, 2022 - openaccess.thecvf.com
Human action recognition has recently become one ofthe popular research topics in the
computer vision community. Various 3D-CNN based methods have been presented to tackle …

Something-else: Compositional action recognition with spatial-temporal interaction networks

J Materzynska, T Xiao, R Herzig, H Xu… - Proceedings of the …, 2020 - openaccess.thecvf.com
Human action is naturally compositional: humans can easily recognize and perform actions
with objects that are different from those used in training demonstrations. In this paper, we …

Attention clusters: Purely attention based local feature integration for video classification

X Long, C Gan, G De Melo, J Wu… - Proceedings of the …, 2018 - openaccess.thecvf.com
Recently, substantial research effort has focused on how to apply CNNs or RNNs to better
capture temporal patterns in videos, so as to improve the accuracy of video classification. In …