Temporal sentence grounding in videos: A survey and future directions

H Zhang, A Sun, W Jing, JT Zhou - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …

Fine-tuning multimodal llms to follow zero-shot demonstrative instructions

J Li, K Pan, Z Ge, M Gao, W Ji, W Zhang… - The Twelfth …, 2023 - openreview.net
Recent advancements in Multimodal Large Language Models (MLLMs) have been utilizing
Visual Prompt Generators (VPGs) to convert visual features into tokens that LLMs can …

Fedseg: Class-heterogeneous federated learning for semantic segmentation

J Miao, Z Yang, L Fan, Y Yang - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Federated Learning (FL) is a distributed learning paradigm that collaboratively learns a
global model across multiple clients with data privacy-preserving. Although many FL …

Winner: Weakly-supervised hierarchical decomposition and alignment for spatio-temporal video grounding

M Li, H Wang, W Zhang, J Miao… - Proceedings of the …, 2023 - openaccess.thecvf.com
Spatio-temporal video grounding aims to localize the aligned visual tube corresponding to a
language query. Existing techniques achieve such alignment by exploiting dense boundary …

Revisiting the domain shift and sample uncertainty in multi-source active domain transfer

W Zhang, Z Lv, H Zhou, JW Liu, J Li… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Active Domain Adaptation (ADA) aims to maximally boost model adaptation in a
new target domain by actively selecting a limited number of target data to annotate. This …

Are binary annotations sufficient? video moment retrieval via hierarchical uncertainty-based active learning

W Ji, R Liang, Z Zheng, W Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent research on video moment retrieval has mostly focused on enhancing the
performance of accuracy, efficiency, and robustness, all of which largely rely on the …

ML-LJP: multi-law aware legal judgment prediction

Y Liu, Y Wu, Y Zhang, C Sun, W Lu, F Wu… - Proceedings of the 46th …, 2023 - dl.acm.org
Legal judgment prediction (LJP) is a significant task in legal intelligence, which aims to
assist the judges and determine the judgment result based on the case's fact description …

Intelligent model update strategy for sequential recommendation

Z Lv, W Zhang, Z Chen, S Zhang, K Kuang - Proceedings of the ACM on …, 2024 - dl.acm.org
Modern online platforms are increasingly employing recommendation systems to address
information overload and improve user engagement. There is an evolving paradigm in this …

Gradient-regulated meta-prompt learning for generalizable vision-language models

J Li, M Gao, L Wei, S Tang, W Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Prompt tuning, a recently emerging paradigm, enables the powerful vision-language pre-
training models to adapt to downstream tasks in a parameter-and data-efficient way, by …

Video-audio domain generalization via confounder disentanglement

S Zhang, X Feng, W Fan, W Fang, F Feng… - Proceedings of the …, 2023 - ojs.aaai.org
Existing video-audio understanding models are trained and evaluated in an intra-domain
setting, facing performance degeneration in real-world applications where multiple domains …