Attentive spatio-temporal representation learning for diving classification

M Yao, H Gao, G Zhao, D Wang… - Proceedings of the …, 2021 - openaccess.thecvf.com

How to effectively and efficiently deal with spatio-temporal event streams, where the events
are generally sparse and non-uniform and have the us temporal resolution, is of great value …

被引用次数：137 相关文章所有 7 个版本

[PDF] thecvf.com

Skeletonmae: graph-based masked autoencoder for skeleton sequence pre-training

H Yan, Y Liu, Y Wei, Z Li, G Li… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Skeleton sequence representation learning has shown great advantages for action
recognition due to its promising ability to model human joints and topology. However, the …

被引用次数：25 相关文章所有 5 个版本

Transfer learning and its extensive appositeness in human activity recognition: A survey

A Ray, MH Kolekar - Expert Systems with Applications, 2023 - Elsevier

In this competitive world, the supervision and monitoring of human resources are primary
and necessary tasks to drive context-aware applications. Advancement in sensor and …

被引用次数：4 相关文章

[PDF] thecvf.com

Gate-shift networks for video action recognition

S Sudhakaran, S Escalera… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com

Deep 3D CNNs for video action recognition are designed to learn powerful representations
in the joint spatio-temporal feature space. In practice however, because of the large number …

被引用次数：183 相关文章所有 14 个版本

[PDF] arxiv.org

A survey on video action recognition in sports: Datasets, methods and applications

F Wu, Q Wang, J Bian, N Ding, F Lu… - IEEE Transactions …, 2022 - ieeexplore.ieee.org

To understand human behaviors, action recognition based on videos is a common
approach. Compared with image-based action recognition, videos provide much more …

被引用次数：46 相关文章所有 4 个版本

[PDF] thecvf.com

Temporal query networks for fine-grained video understanding

C Zhang, A Gupta, A Zisserman - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

Our objective in this work is fine-grained classification of actions in untrimmed videos, where
the actions may be temporally extended or may span only a few frames of the video. We cast …

被引用次数：82 相关文章所有 12 个版本

[PDF] thecvf.com

Removing the background by adding the background: Towards background robust self-supervised video representation learning

J Wang, Y Gao, K Li, Y Lin, AJ Ma… - Proceedings of the …, 2021 - openaccess.thecvf.com

Self-supervised learning has shown great potentials in improving the video representation
ability of deep neural networks by getting supervision from the data itself. However, some of …

被引用次数：95 相关文章所有 7 个版本

[PDF] thecvf.com

Video modeling with correlation networks

H Wang, D Tran, L Torresani… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com

Motion is a salient cue to recognize actions in video. Modern action recognition models
leverage motion information either explicitly by using optical flow as input or implicitly by …

被引用次数：157 相关文章所有 8 个版本

[PDF] arxiv.org

Sportscap: Monocular 3d human motion capture and fine-grained understanding in challenging sports videos

X Chen, A Pang, W Yang, Y Ma, L Xu, J Yu - International Journal of …, 2021 - Springer

Markerless motion capture and understanding of professional non-daily human movements
is an important yet unsolved task, which suffers from complex motion patterns and severe …

被引用次数：45 相关文章所有 10 个版本

[PDF] arxiv.org

Depthwise spatio-temporal STFT convolutional neural networks for human action recognition

S Kumawat, M Verma, Y Nakashima… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

Conventional 3D convolutional neural networks (CNNs) are computationally expensive,
memory intensive, prone to overfitting, and most importantly, there is a need to improve their …

被引用次数：51 相关文章所有 8 个版本