Deep learning-based action detection in untrimmed videos: A survey

E Vahdani, Y Tian - IEEE Transactions on Pattern Analysis and …, 2022 - ieeexplore.ieee.org
Understanding human behavior and activity facilitates advancement of numerous real-world
applications, and is critical for video analysis. Despite the progress of action recognition …

Prompting visual-language models for efficient video understanding

C Ju, T Han, K Zheng, Y Zhang, W Xie - European Conference on …, 2022 - Springer
Image-based visual-language (I-VL) pre-training has shown great success for learning joint
visual-textual representations from large-scale web data, revealing remarkable ability for …

Spatio-temporal attention networks for action recognition and detection

J Li, X Liu, W Zhang, M Zhang, J Song… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Recently, 3D Convolutional Neural Network (3D CNN) models have been widely studied for
video sequences and achieved satisfying performance in action recognition and detection …

Distilling vision-language pre-training to collaborate with weakly-supervised temporal action localization

C Ju, K Zheng, J Liu, P Zhao, Y Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Weakly-supervised temporal action localization (WTAL) learns to detect and classify action
instances with only category labels. Most methods widely adopt the off-the-shelf …

Basictad: an astounding rgb-only baseline for temporal action detection

M Yang, G Chen, YD Zheng, T Lu, L Wang - Computer Vision and Image …, 2023 - Elsevier
Temporal action detection (TAD) is extensively studied in the video understanding
community by generally following the object detection pipeline in images. However, complex …

DDG-Net: Discriminability-Driven Graph Network for Weakly-supervised Temporal Action Localization

X Tang, J Fan, C Luo, Z Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Weakly-supervised temporal action localization (WTAL) is a practical yet challenging task.
Due to large-scale datasets, most existing methods use a network pretrained in other …

Maximization and restoration: Action segmentation through dilation passing and temporal reconstruction

J Park, D Kim, S Huh, S Jo - Pattern Recognition, 2022 - Elsevier
Action segmentation aims to split videos into segments of different actions. Recent work
focuses on dealing with long-range dependencies of long, untrimmed videos, but still suffers …

Graph attention based proposal 3d convnets for action detection

J Li, X Liu, Z Zong, W Zhao, M Zhang… - Proceedings of the AAAI …, 2020 - ojs.aaai.org
The recent advances in 3D Convolutional Neural Networks (3D CNNs) have shown
promising performance for untrimmed video action detection, employing the popular …

RGB stream is enough for temporal action detection

C Wang, H Cai, Y Zou, Y Xiong - arXiv preprint arXiv:2107.04362, 2021 - arxiv.org
State-of-the-art temporal action detectors to date are based on two-stream input including
RGB frames and optical flow. Although combining RGB frames and optical flow boosts …

Temporal attention-pyramid pooling for temporal action detection

MG Gan, Y Zhang - IEEE Transactions on Multimedia, 2022 - ieeexplore.ieee.org
Temporal action detection is a challenging task in video understanding, which is usually
divided into two stages: proposal generation and classification. Learning proposal features …