Open-vocabulary semantic segmentation via attribute decomposition-aggregation

C Ma, Y Yuhuan, C Ju, F Zhang… - Advances in Neural …, 2024 - proceedings.neurips.cc
Open-vocabulary semantic segmentation is a challenging task that requires segmenting
novel object categories at inference time. Recent works explore vision-language pre-training …

Distilling vision-language pre-training to collaborate with weakly-supervised temporal action localization

C Ju, K Zheng, J Liu, P Zhao, Y Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Weakly-supervised temporal action localization (WTAL) learns to detect and classify action
instances with only category labels. Most methods widely adopt the off-the-shelf …

D3g: Exploring gaussian prior for temporal sentence grounding with glance annotation

H Li, X Shu, S He, R Qiao, W Wen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Temporal sentence grounding (TSG) aims to locate a specific moment from an untrimmed
video with a given natural language query. Recently, weakly supervised methods still have a …

Audio-Visual Segmentation via Unlabeled Frame Exploitation

J Liu, Y Liu, F Zhang, C Ju… - Proceedings of the …, 2024 - openaccess.thecvf.com
Audio-visual segmentation (AVS) aims to segment the sounding objects in video frames.
Although great progress has been witnessed we experimentally reveal that current methods …

Multi-modal prompting for low-shot temporal action localization

C Ju, Z Li, P Zhao, Y Zhang, X Zhang, Q Tian… - arXiv preprint arXiv …, 2023 - arxiv.org
In this paper, we consider the problem of temporal action localization under low-shot (zero-
shot & few-shot) scenario, with the goal of detecting and classifying the action instances from …

Multi-modal prototypes for open-set semantic segmentation

Y Yang, C Ma, C Ju, Y Zhang, Y Wang - arXiv preprint arXiv:2307.02003, 2023 - arxiv.org
In semantic segmentation, adapting a visual system to novel object categories at inference
time has always been both valuable and challenging. To enable such generalization …

Turbo: Informativity-driven acceleration plug-in for vision-language models

C Ju, H Wang, Z Li, X Chen, Z Zhai, W Huang… - arXiv preprint arXiv …, 2023 - arxiv.org
Vision-Language Large Models (VLMs) have become primary backbone of AI, due to the
impressive performance. However, their expensive computation costs, ie, throughput and …

Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models

C Ju, H Wang, H Cheng, X Chen, Z Zhai… - arXiv preprint arXiv …, 2024 - arxiv.org
Vision-Language Large Models (VLMs) recently become primary backbone of AI, due to the
impressive performance. However, their expensive computation costs, ie, throughput and …

AttrSeg: open-vocabulary semantic segmentation via attribute decomposition-aggregation

C Ma, Y Yang, C Ju, F Zhang, Y Zhang… - … -seventh Conference on …, 2023 - openreview.net
Open-vocabulary semantic segmentation is a challenging task that requires segmenting
novel object categories at inference time. Recent works explore vision-language pre-training …

Collaborative Debias Strategy for Temporal Sentence Grounding in Video

Z Qi, Y Yuan, X Ruan, S Wang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Temporal sentence grounding in video has witnessed significant advancements, but suffers
from substantial dataset bias, which undermines its generalization ability. Existing debias …