Temporal sentence grounding in videos: A survey and future directions
Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …
Fine-tuning multimodal llms to follow zero-shot demonstrative instructions
Recent advancements in Multimodal Large Language Models (MLLMs) have been utilizing
Visual Prompt Generators (VPGs) to convert visual features into tokens that LLMs can …
Visual Prompt Generators (VPGs) to convert visual features into tokens that LLMs can …
Fedseg: Class-heterogeneous federated learning for semantic segmentation
Federated Learning (FL) is a distributed learning paradigm that collaboratively learns a
global model across multiple clients with data privacy-preserving. Although many FL …
global model across multiple clients with data privacy-preserving. Although many FL …
Winner: Weakly-supervised hierarchical decomposition and alignment for spatio-temporal video grounding
Spatio-temporal video grounding aims to localize the aligned visual tube corresponding to a
language query. Existing techniques achieve such alignment by exploiting dense boundary …
language query. Existing techniques achieve such alignment by exploiting dense boundary …
Revisiting the domain shift and sample uncertainty in multi-source active domain transfer
Abstract Active Domain Adaptation (ADA) aims to maximally boost model adaptation in a
new target domain by actively selecting a limited number of target data to annotate. This …
new target domain by actively selecting a limited number of target data to annotate. This …
Are binary annotations sufficient? video moment retrieval via hierarchical uncertainty-based active learning
Recent research on video moment retrieval has mostly focused on enhancing the
performance of accuracy, efficiency, and robustness, all of which largely rely on the …
performance of accuracy, efficiency, and robustness, all of which largely rely on the …
ML-LJP: multi-law aware legal judgment prediction
Legal judgment prediction (LJP) is a significant task in legal intelligence, which aims to
assist the judges and determine the judgment result based on the case's fact description …
assist the judges and determine the judgment result based on the case's fact description …
Intelligent model update strategy for sequential recommendation
Modern online platforms are increasingly employing recommendation systems to address
information overload and improve user engagement. There is an evolving paradigm in this …
information overload and improve user engagement. There is an evolving paradigm in this …
Gradient-regulated meta-prompt learning for generalizable vision-language models
Prompt tuning, a recently emerging paradigm, enables the powerful vision-language pre-
training models to adapt to downstream tasks in a parameter-and data-efficient way, by …
training models to adapt to downstream tasks in a parameter-and data-efficient way, by …
Video-audio domain generalization via confounder disentanglement
Existing video-audio understanding models are trained and evaluated in an intra-domain
setting, facing performance degeneration in real-world applications where multiple domains …
setting, facing performance degeneration in real-world applications where multiple domains …