Divide and conquer for single-frame temporal action localization

C Ju, T Han, K Zheng, Y Zhang, W Xie - European Conference on …, 2022 - Springer

Image-based visual-language (I-VL) pre-training has shown great success for learning joint
visual-textual representations from large-scale web data, revealing remarkable ability for …

被引用次数：294 相关文章所有 6 个版本

[PDF] thecvf.com

Weakly supervised temporal sentence grounding with gaussian-based contrastive proposal learning

M Zheng, Y Huang, Q Chen… - Proceedings of the …, 2022 - openaccess.thecvf.com

Temporal sentence grounding aims to detect the most salient moment corresponding to the
natural language query from untrimmed videos. As labeling the temporal boundaries is labor …

被引用次数：57 相关文章所有 5 个版本

[PDF] neurips.cc

Open-vocabulary semantic segmentation via attribute decomposition-aggregation

C Ma, Y Yuhuan, C Ju, F Zhang… - Advances in Neural …, 2024 - proceedings.neurips.cc

Open-vocabulary semantic segmentation is a challenging task that requires segmenting
novel object categories at inference time. Recent works explore vision-language pre-training …

被引用次数：9 相关文章所有 4 个版本

[PDF] thecvf.com

Distilling vision-language pre-training to collaborate with weakly-supervised temporal action localization

C Ju, K Zheng, J Liu, P Zhao, Y Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Weakly-supervised temporal action localization (WTAL) learns to detect and classify action
instances with only category labels. Most methods widely adopt the off-the-shelf …

被引用次数：17 相关文章所有 6 个版本

[PDF] arxiv.org

A generalized and robust framework for timestamp supervision in temporal action segmentation

R Rahaman, D Singhania, A Thiery, A Yao - European Conference on …, 2022 - Springer

In temporal action segmentation, Timestamp Supervision requires only a handful of labelled
frames per video sequence. For unlabelled frames, previous works rely on assigning hard …

被引用次数：20 相关文章所有 6 个版本

[PDF] thecvf.com

DDG-Net: Discriminability-Driven Graph Network for Weakly-supervised Temporal Action Localization

X Tang, J Fan, C Luo, Z Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Weakly-supervised temporal action localization (WTAL) is a practical yet challenging task.
Due to large-scale datasets, most existing methods use a network pretrained in other …

被引用次数：4 相关文章所有 5 个版本

[PDF] thecvf.com

Audio-Visual Segmentation via Unlabeled Frame Exploitation

J Liu, Y Liu, F Zhang, C Ju… - Proceedings of the …, 2024 - openaccess.thecvf.com

Audio-visual segmentation (AVS) aims to segment the sounding objects in video frames.
Although great progress has been witnessed we experimentally reveal that current methods …

被引用次数：2 相关文章所有 4 个版本

Prototype contrastive learning for point-supervised temporal action detection

P Li, J Cao, X Ye - Expert Systems with Applications, 2023 - Elsevier

Detecting temporal actions in a video with only single-frame annotation in each action
instance or segment, aka, point-level supervision, has emerged as a more challenging task …

被引用次数：15 相关文章所有 2 个版本

[PDF] arxiv.org

Multi-modal prompting for low-shot temporal action localization

C Ju, Z Li, P Zhao, Y Zhang, X Zhang, Q Tian… - arXiv preprint arXiv …, 2023 - arxiv.org

In this paper, we consider the problem of temporal action localization under low-shot (zero-
shot & few-shot) scenario, with the goal of detecting and classifying the action instances from …

被引用次数：13 相关文章所有 2 个版本

Compact representation and reliable classification learning for point-level weakly-supervised action localization

J Fu, J Gao, C Xu - IEEE Transactions on Image Processing, 2022 - ieeexplore.ieee.org

Point-level weakly-supervised temporal action localization (P-WSTAL) aims to localize
temporal extents of action instances and identify the corresponding categories with only a …

被引用次数：11 相关文章所有 4 个版本