Weakly-supervised video object grounding via learning uni-modal associations

文章

学术资源搜索

获得 3 条结果（用时0.04秒）

Weakly-supervised video object grounding via learning uni-modal associations

Conditional Video Diffusion Network for Fine-grained Temporal Sentence Grounding

D Liu, J Zhu, X Fang, Z Xiong, H Wang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Temporal sentence grounding (TSG) aims to locate a semantically related segment of an
untrimmed video guided by a sentence query. Since the untrimmed videos are too long …

被引用次数：3 相关文章

[PDF] arxiv.org

Towards Weakly Supervised Text-to-Audio Grounding

X Xu, Z Ma, M Wu, K Yu - arXiv preprint arXiv:2401.02584, 2024 - arxiv.org

Text-to-audio grounding (TAG) task aims to predict the onsets and offsets of sound events
described by natural language. This task can facilitate applications such as multimodal …

被引用次数：5 相关文章所有 2 个版本

A dual reinforcement learning framework for weakly supervised phrase grounding

Z Wang, C Yang, B Jiang, J Yuan - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Weakly-supervised phrase grounding aims to localize a specific region in an image that
corresponds to the given textual phrase, where the mapping between noun phrases and …

被引用次数：2 相关文章所有 2 个版本