Vision transformers for action recognition: A survey
Vision transformers are emerging as a powerful tool to solve computer vision problems.
Recent techniques have also proven the efficacy of transformers beyond the image domain …
Recent techniques have also proven the efficacy of transformers beyond the image domain …
[HTML][HTML] k-NN attention-based video vision transformer for action recognition
Action Recognition aims to understand human behavior and predict a label for each action.
Recently, Vision Transformer (ViT) has achieved remarkable performance on action …
Recently, Vision Transformer (ViT) has achieved remarkable performance on action …
Transformer in Touch: A Survey
The Transformer model, initially achieving significant success in the field of natural language
processing, has recently shown great potential in the application of tactile perception. This …
processing, has recently shown great potential in the application of tactile perception. This …
Towards Long Form Audio-visual Video Understanding
We live in a world filled with never-ending streams of multimodal information. As a more
natural recording of the real scenario, long form audio-visual videos are expected as an …
natural recording of the real scenario, long form audio-visual videos are expected as an …
Shifted GCN-GAT and Cumulative-Transformer based Social Relation Recognition for Long Videos
Social Relation Recognition is an important part of Video Understanding, providing insights
into the information that videos convey. Most previous works mainly focused on graph …
into the information that videos convey. Most previous works mainly focused on graph …
MMSF: A multimodal sentiment-fused method to recognize video speaking style
As talking takes a large proportion of human lives, it is necessary to perform deeper
understanding of human conversations. Speaking style recognition is aimed at recognizing …
understanding of human conversations. Speaking style recognition is aimed at recognizing …
Progressive Complementation Network With Semantics and Details for Salient Object Detection in Optical Remote Sensing Images
R Zhao, P Zheng, C Zhang… - IEEE Journal of Selected …, 2024 - ieeexplore.ieee.org
The existing salient object detection in optical remote sensing images methods mostly
employ the same strategy to handle features at different levels without fully considering the …
employ the same strategy to handle features at different levels without fully considering the …
Reproducibility Companion Paper of" MMSF: A Multimodal Sentiment-Fused Method to Recognize Video Speaking Style"
To support the replication of" MMSF: A Multimodal Sentiment-Fused Method to Recognize
Video Speaking Style", which was presented at ICMR'23, this companion paper provides the …
Video Speaking Style", which was presented at ICMR'23, this companion paper provides the …
Real-Time Human Action Recognition on Embedded Platforms
R Wang, Z Wang, P Gao, M Li, J Jeong, Y Xu… - arXiv preprint arXiv …, 2024 - arxiv.org
With advancements in computer vision and deep learning, video-based human action
recognition (HAR) has become practical. However, due to the complexity of the computation …
recognition (HAR) has become practical. However, due to the complexity of the computation …
Crime Detection from Pre-crime Video Analysis with Augmented Pose and Emotion Information
S Kilic, M Tuceryan - 2024 IEEE Southwest Symposium on …, 2024 - ieeexplore.ieee.org
This study aims to detect pre-crime events in videos focusing on shoplifting. Our work
proposes a novel approach of augmenting human pose information and emotion information …
proposes a novel approach of augmenting human pose information and emotion information …