Human action recognition from various data modalities: A review

Z Sun, Q Ke, H Rahmani, M Bennamoun… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Human Action Recognition (HAR) aims to understand human behavior and assign a label to
each action. It has a wide range of applications, and therefore has been attracting increasing …

Dynamic neural networks: A survey

Y Han, G Huang, S Song, L Yang… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Dynamic neural network is an emerging research topic in deep learning. Compared to static
models which have fixed computational graphs and parameters at the inference stage …

Adaptive token sampling for efficient vision transformers

M Fayyaz, SA Koohpayegani, FR Jafari… - … on Computer Vision, 2022 - Springer
While state-of-the-art vision transformer models achieve promising results in image
classification, they are computationally expensive and require many GFLOPs. Although the …

Ams-net: Modeling adaptive multi-granularity spatio-temporal cues for video action recognition

Q Wang, Q Hu, Z Gao, P Li, Q Hu - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Effective spatio-temporal modeling as a core of video representation learning is challenged
by complex scale variations in spatio-temporal cues in videos, especially different visual …

Efficient video action detection with token dropout and context refinement

L Chen, Z Tong, Y Song, G Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Streaming video clips with large-scale video tokens impede vision transformers (ViTs) for
efficient recognition, especially in video action detection where sufficient spatiotemporal …

ViGAT: Bottom-up event recognition and explanation in video using factorized graph attention network

N Gkalelis, D Daskalakis, V Mezaris - IEEE Access, 2022 - ieeexplore.ieee.org
In this paper a pure-attention bottom-up approach, called ViGAT, that utilizes an object
detector together with a Vision Transformer (ViT) backbone network to derive object and …

Constructing better prototype generators with 3D CNNs for few-shot text classification

X Wang, Y Du, D Chen, X Li, X Chen, Y Lee… - Expert Systems with …, 2023 - Elsevier
Prototypical network is a key algorithm to solve few-shot problems. Previous prototypical
network based methods average sentence embeddings of the same class to obtain …

Uncovering the Unseen: Discover Hidden Intentions by Micro-Behavior Graph Reasoning

Z Zhou, W Liu, D Xu, Z Wang, J Zhao - Proceedings of the 31st ACM …, 2023 - dl.acm.org
This paper introduces a new and challenging Hidden Intention Discovery (HID) task. Unlike
existing intention recognition tasks, which are based on obvious visual representations to …

Identity-aware graph memory network for action detection

J Ni, J Qin, D Huang - Proceedings of the 29th ACM International …, 2021 - dl.acm.org
Action detection plays an important role in high-level video understanding and media
interpretation. Many existing studies fulfill this spatio-temporal localization by modeling the …

EPK-CLIP: External and Priori Knowledge CLIP for action recognition

Z Yang, G An, Z Zheng, S Cao, F Wang - Expert Systems with Applications, 2024 - Elsevier
Abstract Contrastive Language-Image Pretraining (CLIP) models have achieved significant
success and have markedly improved the performance of various downstream tasks …