A survey on video moment localization

M Liu, L Nie, Y Wang, M Wang, Y Rui - ACM Computing Surveys, 2023 - dl.acm.org
Video moment localization, also known as video moment retrieval, aims to search a target
segment within a video described by a given natural language query. Beyond the task of …

Temporal sentence grounding in videos: A survey and future directions

H Zhang, A Sun, W Jing, JT Zhou - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …

Tubedetr: Spatio-temporal video grounding with transformers

A Yang, A Miech, J Sivic, I Laptev… - Proceedings of the …, 2022 - openaccess.thecvf.com
We consider the problem of localizing a spatio-temporal tube in a video corresponding to a
given text query. This is a challenging task that requires the joint and efficient modeling of …

Can i trust your answer? visually grounded video question answering

J Xiao, A Yao, Y Li, TS Chua - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com
We study visually grounded VideoQA in response to the emerging trends of utilizing
pretraining techniques for video-language understanding. Specifically by forcing vision …

Injecting semantic concepts into end-to-end image captioning

Z Fang, J Wang, X Hu, L Liang, Z Gan… - Proceedings of the …, 2022 - openaccess.thecvf.com
Tremendous progress has been made in recent years in developing better image captioning
models, yet most of them rely on a separate object detector to extract regional features …

Interventional video grounding with dual contrastive learning

G Nan, R Qiao, Y Xiao, J Liu, S Leng… - Proceedings of the …, 2021 - openaccess.thecvf.com
Video grounding aims to localize a moment from an untrimmed video for a given textual
query. Existing approaches focus more on the alignment of visual and language stimuli with …

Context-aware biaffine localizing network for temporal sentence grounding

D Liu, X Qu, J Dong, P Zhou, Y Cheng… - Proceedings of the …, 2021 - openaccess.thecvf.com
This paper addresses the problem of temporal sentence grounding (TSG), which aims to
identify the temporal boundary of a specific segment from an untrimmed video by a sentence …

Fast video moment retrieval

J Gao, C Xu - Proceedings of the IEEE/CVF International …, 2021 - openaccess.thecvf.com
This paper targets at fast video moment retrieval (fast VMR), aiming to localize the target
moment efficiently and accurately as queried by a given natural language sentence. We …

Weakly supervised temporal sentence grounding with gaussian-based contrastive proposal learning

M Zheng, Y Huang, Q Chen… - Proceedings of the …, 2022 - openaccess.thecvf.com
Temporal sentence grounding aims to detect the most salient moment corresponding to the
natural language query from untrimmed videos. As labeling the temporal boundaries is labor …

You can ground earlier than see: An effective and efficient pipeline for temporal sentence grounding in compressed videos

X Fang, D Liu, P Zhou, G Nan - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Given an untrimmed video, temporal sentence grounding (TSG) aims to locate a target
moment semantically according to a sentence query. Although previous respectable works …