Vpn++: Rethinking video-pose embeddings for understanding activities of daily living

H Duan, Y Zhao, K Chen, D Lin… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

Human skeleton, as a compact representation of human action, has received increasing
attention in recent years. Many skeleton-based action recognition methods adopt GCNs to …

被引用次数：581 相关文章所有 7 个版本

Mmnet: A model-based multimodal network for human action recognition in rgb-d videos

XB Bruce, Y Liu, X Zhang, S Zhong… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Human action recognition (HAR) in RGB-D videos has been widely investigated since the
release of affordable depth sensors. Currently, unimodal approaches (eg, skeleton-based …

被引用次数：68 相关文章所有 5 个版本

[PDF] arxiv.org

Transformers in action recognition: A review on temporal modeling

E Shabaninia, H Nezamabadi-pour… - arXiv preprint arXiv …, 2022 - arxiv.org

In vision-based action recognition, spatio-temporal features from different modalities are
used for recognizing activities. Temporal modeling is a long challenge of action recognition …

被引用次数：14 相关文章所有 2 个版本

[PDF] thecvf.com

Lac-latent action composition for skeleton-based action segmentation

D Yang, Y Wang, A Dantcheva… - Proceedings of the …, 2023 - openaccess.thecvf.com

Skeleton-based action segmentation requires recognizing composable actions in untrimmed
videos. Current approaches decouple this problem by first extracting local visual features …

被引用次数：4 相关文章所有 9 个版本

[PDF] thecvf.com

Cross-modal learning with 3D deformable attention for action recognition

S Kim, D Ahn, BC Ko - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

An important challenge in vision-based action recognition is the embedding of
spatiotemporal features with two or more heterogeneous modalities into a single feature. In …

被引用次数：13 相关文章所有 5 个版本

[PDF] thecvf.com

A large-scale study of spatiotemporal representation learning with a new benchmark on action recognition

A Deng, T Yang, C Chen - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

The goal of building a benchmark (suite of datasets) is to provide a unified protocol for fair
evaluation and thus facilitate the evolution of a specific area. Nonetheless, we point out that …

被引用次数：9 相关文章所有 6 个版本

[PDF] thecvf.com

Pose-based contrastive learning for domain agnostic activity representations

D Schneider, S Sarfraz, A Roitberg… - Proceedings of the …, 2022 - openaccess.thecvf.com

While recognition accuracies of video classification models trained on conventional
benchmarks are gradually saturating, recent studies raise alarm about the learned …

被引用次数：14 相关文章所有 4 个版本

[PDF] neurips.cc

Learning viewpoint-agnostic visual representations by recovering tokens in 3d space

J Shang, S Das, M Ryoo - Advances in Neural Information …, 2022 - proceedings.neurips.cc

Humans are remarkably flexible in understanding viewpoint changes due to visual cortex
supporting the perception of 3D structure. In contrast, most of the computer vision models …

被引用次数：13 相关文章所有 7 个版本

Human-centric multimodal fusion network for robust action recognition

Z Hu, J Xiao, L Li, C Liu, G Ji - Expert Systems with Applications, 2024 - Elsevier

Skeleton-based methods have made remarkable strides in human action recognition (HAR).
However, the performance of existing unimodal approaches is still limited by the lack of …

被引用次数：5 相关文章所有 2 个版本

[PDF] springer.com

Multimodal vision-based human action recognition using deep learning: a review

F Shafizadegan, AR Naghsh-Nilchi… - Artificial Intelligence …, 2024 - Springer

Abstract Vision-based Human Action Recognition (HAR) is a hot topic in computer vision.
Recently, deep-based HAR has shown promising results. HAR using a single data modality …