Revisiting skeleton-based action recognition
Human skeleton, as a compact representation of human action, has received increasing
attention in recent years. Many skeleton-based action recognition methods adopt GCNs to …
attention in recent years. Many skeleton-based action recognition methods adopt GCNs to …
Mmnet: A model-based multimodal network for human action recognition in rgb-d videos
Human action recognition (HAR) in RGB-D videos has been widely investigated since the
release of affordable depth sensors. Currently, unimodal approaches (eg, skeleton-based …
release of affordable depth sensors. Currently, unimodal approaches (eg, skeleton-based …
Transformers in action recognition: A review on temporal modeling
E Shabaninia, H Nezamabadi-pour… - arXiv preprint arXiv …, 2022 - arxiv.org
In vision-based action recognition, spatio-temporal features from different modalities are
used for recognizing activities. Temporal modeling is a long challenge of action recognition …
used for recognizing activities. Temporal modeling is a long challenge of action recognition …
Lac-latent action composition for skeleton-based action segmentation
Skeleton-based action segmentation requires recognizing composable actions in untrimmed
videos. Current approaches decouple this problem by first extracting local visual features …
videos. Current approaches decouple this problem by first extracting local visual features …
Cross-modal learning with 3D deformable attention for action recognition
An important challenge in vision-based action recognition is the embedding of
spatiotemporal features with two or more heterogeneous modalities into a single feature. In …
spatiotemporal features with two or more heterogeneous modalities into a single feature. In …
A large-scale study of spatiotemporal representation learning with a new benchmark on action recognition
The goal of building a benchmark (suite of datasets) is to provide a unified protocol for fair
evaluation and thus facilitate the evolution of a specific area. Nonetheless, we point out that …
evaluation and thus facilitate the evolution of a specific area. Nonetheless, we point out that …
Pose-based contrastive learning for domain agnostic activity representations
While recognition accuracies of video classification models trained on conventional
benchmarks are gradually saturating, recent studies raise alarm about the learned …
benchmarks are gradually saturating, recent studies raise alarm about the learned …
Learning viewpoint-agnostic visual representations by recovering tokens in 3d space
Humans are remarkably flexible in understanding viewpoint changes due to visual cortex
supporting the perception of 3D structure. In contrast, most of the computer vision models …
supporting the perception of 3D structure. In contrast, most of the computer vision models …
Human-centric multimodal fusion network for robust action recognition
Z Hu, J Xiao, L Li, C Liu, G Ji - Expert Systems with Applications, 2024 - Elsevier
Skeleton-based methods have made remarkable strides in human action recognition (HAR).
However, the performance of existing unimodal approaches is still limited by the lack of …
However, the performance of existing unimodal approaches is still limited by the lack of …
Multimodal vision-based human action recognition using deep learning: a review
F Shafizadegan, AR Naghsh-Nilchi… - Artificial Intelligence …, 2024 - Springer
Abstract Vision-based Human Action Recognition (HAR) is a hot topic in computer vision.
Recently, deep-based HAR has shown promising results. HAR using a single data modality …
Recently, deep-based HAR has shown promising results. HAR using a single data modality …