Revisiting skeleton-based action recognition

H Duan, Y Zhao, K Chen, D Lin… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Human skeleton, as a compact representation of human action, has received increasing
attention in recent years. Many skeleton-based action recognition methods adopt GCNs to …

Mmnet: A model-based multimodal network for human action recognition in rgb-d videos

XB Bruce, Y Liu, X Zhang, S Zhong… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Human action recognition (HAR) in RGB-D videos has been widely investigated since the
release of affordable depth sensors. Currently, unimodal approaches (eg, skeleton-based …

Transformers in action recognition: A review on temporal modeling

E Shabaninia, H Nezamabadi-pour… - arXiv preprint arXiv …, 2022 - arxiv.org
In vision-based action recognition, spatio-temporal features from different modalities are
used for recognizing activities. Temporal modeling is a long challenge of action recognition …

Lac-latent action composition for skeleton-based action segmentation

D Yang, Y Wang, A Dantcheva… - Proceedings of the …, 2023 - openaccess.thecvf.com
Skeleton-based action segmentation requires recognizing composable actions in untrimmed
videos. Current approaches decouple this problem by first extracting local visual features …

Cross-modal learning with 3D deformable attention for action recognition

S Kim, D Ahn, BC Ko - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
An important challenge in vision-based action recognition is the embedding of
spatiotemporal features with two or more heterogeneous modalities into a single feature. In …

A large-scale study of spatiotemporal representation learning with a new benchmark on action recognition

A Deng, T Yang, C Chen - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
The goal of building a benchmark (suite of datasets) is to provide a unified protocol for fair
evaluation and thus facilitate the evolution of a specific area. Nonetheless, we point out that …

Pose-based contrastive learning for domain agnostic activity representations

D Schneider, S Sarfraz, A Roitberg… - Proceedings of the …, 2022 - openaccess.thecvf.com
While recognition accuracies of video classification models trained on conventional
benchmarks are gradually saturating, recent studies raise alarm about the learned …

Learning viewpoint-agnostic visual representations by recovering tokens in 3d space

J Shang, S Das, M Ryoo - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Humans are remarkably flexible in understanding viewpoint changes due to visual cortex
supporting the perception of 3D structure. In contrast, most of the computer vision models …

Human-centric multimodal fusion network for robust action recognition

Z Hu, J Xiao, L Li, C Liu, G Ji - Expert Systems with Applications, 2024 - Elsevier
Skeleton-based methods have made remarkable strides in human action recognition (HAR).
However, the performance of existing unimodal approaches is still limited by the lack of …

Multimodal vision-based human action recognition using deep learning: a review

F Shafizadegan, AR Naghsh-Nilchi… - Artificial Intelligence …, 2024 - Springer
Abstract Vision-based Human Action Recognition (HAR) is a hot topic in computer vision.
Recently, deep-based HAR has shown promising results. HAR using a single data modality …