A survey on video moment localization
Video moment localization, also known as video moment retrieval, aims to search a target
segment within a video described by a given natural language query. Beyond the task of …
segment within a video described by a given natural language query. Beyond the task of …
Temporal sentence grounding in videos: A survey and future directions
Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …
Tubedetr: Spatio-temporal video grounding with transformers
We consider the problem of localizing a spatio-temporal tube in a video corresponding to a
given text query. This is a challenging task that requires the joint and efficient modeling of …
given text query. This is a challenging task that requires the joint and efficient modeling of …
Can i trust your answer? visually grounded video question answering
We study visually grounded VideoQA in response to the emerging trends of utilizing
pretraining techniques for video-language understanding. Specifically by forcing vision …
pretraining techniques for video-language understanding. Specifically by forcing vision …
Injecting semantic concepts into end-to-end image captioning
Tremendous progress has been made in recent years in developing better image captioning
models, yet most of them rely on a separate object detector to extract regional features …
models, yet most of them rely on a separate object detector to extract regional features …
Interventional video grounding with dual contrastive learning
Video grounding aims to localize a moment from an untrimmed video for a given textual
query. Existing approaches focus more on the alignment of visual and language stimuli with …
query. Existing approaches focus more on the alignment of visual and language stimuli with …
Context-aware biaffine localizing network for temporal sentence grounding
This paper addresses the problem of temporal sentence grounding (TSG), which aims to
identify the temporal boundary of a specific segment from an untrimmed video by a sentence …
identify the temporal boundary of a specific segment from an untrimmed video by a sentence …
Fast video moment retrieval
This paper targets at fast video moment retrieval (fast VMR), aiming to localize the target
moment efficiently and accurately as queried by a given natural language sentence. We …
moment efficiently and accurately as queried by a given natural language sentence. We …
Weakly supervised temporal sentence grounding with gaussian-based contrastive proposal learning
Temporal sentence grounding aims to detect the most salient moment corresponding to the
natural language query from untrimmed videos. As labeling the temporal boundaries is labor …
natural language query from untrimmed videos. As labeling the temporal boundaries is labor …
You can ground earlier than see: An effective and efficient pipeline for temporal sentence grounding in compressed videos
X Fang, D Liu, P Zhou, G Nan - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Given an untrimmed video, temporal sentence grounding (TSG) aims to locate a target
moment semantically according to a sentence query. Although previous respectable works …
moment semantically according to a sentence query. Although previous respectable works …