Human action recognition from various data modalities: A review
Human Action Recognition (HAR) aims to understand human behavior and assign a label to
each action. It has a wide range of applications, and therefore has been attracting increasing …
each action. It has a wide range of applications, and therefore has been attracting increasing …
Dynamic neural networks: A survey
Dynamic neural network is an emerging research topic in deep learning. Compared to static
models which have fixed computational graphs and parameters at the inference stage …
models which have fixed computational graphs and parameters at the inference stage …
Adaptive token sampling for efficient vision transformers
While state-of-the-art vision transformer models achieve promising results in image
classification, they are computationally expensive and require many GFLOPs. Although the …
classification, they are computationally expensive and require many GFLOPs. Although the …
Ams-net: Modeling adaptive multi-granularity spatio-temporal cues for video action recognition
Effective spatio-temporal modeling as a core of video representation learning is challenged
by complex scale variations in spatio-temporal cues in videos, especially different visual …
by complex scale variations in spatio-temporal cues in videos, especially different visual …
Efficient video action detection with token dropout and context refinement
L Chen, Z Tong, Y Song, G Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Streaming video clips with large-scale video tokens impede vision transformers (ViTs) for
efficient recognition, especially in video action detection where sufficient spatiotemporal …
efficient recognition, especially in video action detection where sufficient spatiotemporal …
ViGAT: Bottom-up event recognition and explanation in video using factorized graph attention network
N Gkalelis, D Daskalakis, V Mezaris - IEEE Access, 2022 - ieeexplore.ieee.org
In this paper a pure-attention bottom-up approach, called ViGAT, that utilizes an object
detector together with a Vision Transformer (ViT) backbone network to derive object and …
detector together with a Vision Transformer (ViT) backbone network to derive object and …
Constructing better prototype generators with 3D CNNs for few-shot text classification
Prototypical network is a key algorithm to solve few-shot problems. Previous prototypical
network based methods average sentence embeddings of the same class to obtain …
network based methods average sentence embeddings of the same class to obtain …
Uncovering the Unseen: Discover Hidden Intentions by Micro-Behavior Graph Reasoning
This paper introduces a new and challenging Hidden Intention Discovery (HID) task. Unlike
existing intention recognition tasks, which are based on obvious visual representations to …
existing intention recognition tasks, which are based on obvious visual representations to …
Identity-aware graph memory network for action detection
Action detection plays an important role in high-level video understanding and media
interpretation. Many existing studies fulfill this spatio-temporal localization by modeling the …
interpretation. Many existing studies fulfill this spatio-temporal localization by modeling the …
EPK-CLIP: External and Priori Knowledge CLIP for action recognition
Abstract Contrastive Language-Image Pretraining (CLIP) models have achieved significant
success and have markedly improved the performance of various downstream tasks …
success and have markedly improved the performance of various downstream tasks …