Conditional Video Diffusion Network for Fine-grained Temporal Sentence Grounding

D Liu, J Zhu, X Fang, Z Xiong, H Wang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Temporal sentence grounding (TSG) aims to locate a semantically related segment of an
untrimmed video guided by a sentence query. Since the untrimmed videos are too long …

Towards Weakly Supervised Text-to-Audio Grounding

X Xu, Z Ma, M Wu, K Yu - arXiv preprint arXiv:2401.02584, 2024 - arxiv.org
Text-to-audio grounding (TAG) task aims to predict the onsets and offsets of sound events
described by natural language. This task can facilitate applications such as multimodal …

A dual reinforcement learning framework for weakly supervised phrase grounding

Z Wang, C Yang, B Jiang, J Yuan - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Weakly-supervised phrase grounding aims to localize a specific region in an image that
corresponds to the given textual phrase, where the mapping between noun phrases and …