Deep learning-based action detection in untrimmed videos: A survey
Understanding human behavior and activity facilitates advancement of numerous real-world
applications, and is critical for video analysis. Despite the progress of action recognition …
applications, and is critical for video analysis. Despite the progress of action recognition …
Prompting visual-language models for efficient video understanding
Image-based visual-language (I-VL) pre-training has shown great success for learning joint
visual-textual representations from large-scale web data, revealing remarkable ability for …
visual-textual representations from large-scale web data, revealing remarkable ability for …
Spatio-temporal attention networks for action recognition and detection
Recently, 3D Convolutional Neural Network (3D CNN) models have been widely studied for
video sequences and achieved satisfying performance in action recognition and detection …
video sequences and achieved satisfying performance in action recognition and detection …
Distilling vision-language pre-training to collaborate with weakly-supervised temporal action localization
Weakly-supervised temporal action localization (WTAL) learns to detect and classify action
instances with only category labels. Most methods widely adopt the off-the-shelf …
instances with only category labels. Most methods widely adopt the off-the-shelf …
Basictad: an astounding rgb-only baseline for temporal action detection
Temporal action detection (TAD) is extensively studied in the video understanding
community by generally following the object detection pipeline in images. However, complex …
community by generally following the object detection pipeline in images. However, complex …
DDG-Net: Discriminability-Driven Graph Network for Weakly-supervised Temporal Action Localization
Weakly-supervised temporal action localization (WTAL) is a practical yet challenging task.
Due to large-scale datasets, most existing methods use a network pretrained in other …
Due to large-scale datasets, most existing methods use a network pretrained in other …
Maximization and restoration: Action segmentation through dilation passing and temporal reconstruction
Action segmentation aims to split videos into segments of different actions. Recent work
focuses on dealing with long-range dependencies of long, untrimmed videos, but still suffers …
focuses on dealing with long-range dependencies of long, untrimmed videos, but still suffers …
Graph attention based proposal 3d convnets for action detection
The recent advances in 3D Convolutional Neural Networks (3D CNNs) have shown
promising performance for untrimmed video action detection, employing the popular …
promising performance for untrimmed video action detection, employing the popular …
RGB stream is enough for temporal action detection
C Wang, H Cai, Y Zou, Y Xiong - arXiv preprint arXiv:2107.04362, 2021 - arxiv.org
State-of-the-art temporal action detectors to date are based on two-stream input including
RGB frames and optical flow. Although combining RGB frames and optical flow boosts …
RGB frames and optical flow. Although combining RGB frames and optical flow boosts …
Temporal attention-pyramid pooling for temporal action detection
MG Gan, Y Zhang - IEEE Transactions on Multimedia, 2022 - ieeexplore.ieee.org
Temporal action detection is a challenging task in video understanding, which is usually
divided into two stages: proposal generation and classification. Learning proposal features …
divided into two stages: proposal generation and classification. Learning proposal features …