End-to-end modeling via information tree for one-shot natural language spatial video grounding

H Zhang, A Sun, W Jing, JT Zhou - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …

被引用次数：39 相关文章所有 8 个版本

[PDF] openreview.net

Fine-tuning multimodal llms to follow zero-shot demonstrative instructions

J Li, K Pan, Z Ge, M Gao, W Ji, W Zhang… - The Twelfth …, 2023 - openreview.net

Recent advancements in Multimodal Large Language Models (MLLMs) have been utilizing
Visual Prompt Generators (VPGs) to convert visual features into tokens that LLMs can …

被引用次数：43 相关文章所有 2 个版本

[PDF] thecvf.com

Fedseg: Class-heterogeneous federated learning for semantic segmentation

J Miao, Z Yang, L Fan, Y Yang - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Federated Learning (FL) is a distributed learning paradigm that collaboratively learns a
global model across multiple clients with data privacy-preserving. Although many FL …

被引用次数：35 相关文章所有 4 个版本

[PDF] thecvf.com

Winner: Weakly-supervised hierarchical decomposition and alignment for spatio-temporal video grounding

M Li, H Wang, W Zhang, J Miao… - Proceedings of the …, 2023 - openaccess.thecvf.com

Spatio-temporal video grounding aims to localize the aligned visual tube corresponding to a
language query. Existing techniques achieve such alignment by exploiting dense boundary …

被引用次数：26 相关文章所有 3 个版本

[PDF] thecvf.com

Revisiting the domain shift and sample uncertainty in multi-source active domain transfer

W Zhang, Z Lv, H Zhou, JW Liu, J Li… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Active Domain Adaptation (ADA) aims to maximally boost model adaptation in a
new target domain by actively selecting a limited number of target data to annotate. This …

被引用次数：11 相关文章所有 3 个版本

[PDF] thecvf.com

Are binary annotations sufficient? video moment retrieval via hierarchical uncertainty-based active learning

W Ji, R Liang, Z Zheng, W Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Recent research on video moment retrieval has mostly focused on enhancing the
performance of accuracy, efficiency, and robustness, all of which largely rely on the …

被引用次数：23 相关文章所有 7 个版本

[PDF] github.io

ML-LJP: multi-law aware legal judgment prediction

Y Liu, Y Wu, Y Zhang, C Sun, W Lu, F Wu… - Proceedings of the 46th …, 2023 - dl.acm.org

Legal judgment prediction (LJP) is a significant task in legal intelligence, which aims to
assist the judges and determine the judgment result based on the case's fact description …

被引用次数：26 相关文章所有 2 个版本

[PDF] researchgate.net

Intelligent model update strategy for sequential recommendation

Z Lv, W Zhang, Z Chen, S Zhang, K Kuang - Proceedings of the ACM on …, 2024 - dl.acm.org

Modern online platforms are increasingly employing recommendation systems to address
information overload and improve user engagement. There is an evolving paradigm in this …

被引用次数：15 相关文章所有 3 个版本

[PDF] thecvf.com

Gradient-regulated meta-prompt learning for generalizable vision-language models

J Li, M Gao, L Wei, S Tang, W Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Prompt tuning, a recently emerging paradigm, enables the powerful vision-language pre-
training models to adapt to downstream tasks in a parameter-and data-efficient way, by …

被引用次数：20 相关文章所有 5 个版本

[PDF] aaai.org

Video-audio domain generalization via confounder disentanglement

S Zhang, X Feng, W Fan, W Fang, F Feng… - Proceedings of the …, 2023 - ojs.aaai.org

Existing video-audio understanding models are trained and evaluated in an intra-domain
setting, facing performance degeneration in real-world applications where multiple domains …

被引用次数：8 相关文章所有 2 个版本