Prompting visual-language models for efficient video understanding
Image-based visual-language (I-VL) pre-training has shown great success for learning joint
visual-textual representations from large-scale web data, revealing remarkable ability for …
visual-textual representations from large-scale web data, revealing remarkable ability for …
Overview of temporal action detection based on deep learning
K Hu, C Shen, T Wang, K Xu, Q Xia, M Xia… - Artificial Intelligence …, 2024 - Springer
Abstract Temporal Action Detection (TAD) aims to accurately capture each action interval in
an untrimmed video and to understand human actions. This paper comprehensively surveys …
an untrimmed video and to understand human actions. This paper comprehensively surveys …
Self-feedback detr for temporal action detection
Abstract Temporal Action Detection (TAD) is challenging but fundamental for real-world
video applications. Recently, DETR-based models have been devised for TAD but have not …
video applications. Recently, DETR-based models have been devised for TAD but have not …
Difftad: Temporal action detection with proposal denoising diffusion
We propose a new formulation of temporal action detection (TAD) with denoising diffusion,
DiffTAD in short. Taking as input random temporal proposals, it can yield action proposals …
DiffTAD in short. Taking as input random temporal proposals, it can yield action proposals …
Decomposed cross-modal distillation for rgb-based temporal action detection
Temporal action detection aims to predict the time intervals and the classes of action
instances in the video. Despite the promising performance, existing two-stream models …
instances in the video. Despite the promising performance, existing two-stream models …
[HTML][HTML] Geostatistical modeling approach for studying total soil nitrogen and phosphorus under various land uses of North-Western Himalayas
The distribution of total soil nitrogen (TSN) and total soil phosphorus (TSP) plays a pivotal
role in shaping soil quality, fertility, agricultural practices, and environmental balance …
role in shaping soil quality, fertility, agricultural practices, and environmental balance …
Distilling vision-language pre-training to collaborate with weakly-supervised temporal action localization
Weakly-supervised temporal action localization (WTAL) learns to detect and classify action
instances with only category labels. Most methods widely adopt the off-the-shelf …
instances with only category labels. Most methods widely adopt the off-the-shelf …
Hierarchical local-global transformer for temporal sentence grounding
This article studies the multimedia problem of temporal sentence grounding (TSG), which
aims to accurately determine the specific video segment in an untrimmed video according to …
aims to accurately determine the specific video segment in an untrimmed video according to …
Action sensitivity learning for temporal action localization
Temporal action localization (TAL), which involves recognizing and locating action
instances, is a challenging task in video understanding. Most existing approaches directly …
instances, is a challenging task in video understanding. Most existing approaches directly …
Learning from noisy pseudo labels for semi-supervised temporal action localization
Abstract Semi-Supervised Temporal Action Localization (SS-TAL) aims to improve the
generalization ability of action detectors with large-scale unlabeled videos. Albeit the recent …
generalization ability of action detectors with large-scale unlabeled videos. Albeit the recent …