A survey on video moment localization
Video moment localization, also known as video moment retrieval, aims to search a target
segment within a video described by a given natural language query. Beyond the task of …
segment within a video described by a given natural language query. Beyond the task of …
Temporal sentence grounding in videos: A survey and future directions
Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …
Weakly supervised temporal sentence grounding with gaussian-based contrastive proposal learning
Temporal sentence grounding aims to detect the most salient moment corresponding to the
natural language query from untrimmed videos. As labeling the temporal boundaries is labor …
natural language query from untrimmed videos. As labeling the temporal boundaries is labor …
You can ground earlier than see: An effective and efficient pipeline for temporal sentence grounding in compressed videos
Given an untrimmed video, temporal sentence grounding (TSG) aims to locate a target
moment semantically according to a sentence query. Although previous respectable works …
moment semantically according to a sentence query. Although previous respectable works …
Are binary annotations sufficient? video moment retrieval via hierarchical uncertainty-based active learning
Recent research on video moment retrieval has mostly focused on enhancing the
performance of accuracy, efficiency, and robustness, all of which largely rely on the …
performance of accuracy, efficiency, and robustness, all of which largely rely on the …
Weakly supervised video moment localization with contrastive negative sample mining
Video moment localization aims at localizing the video segments which are most related to
the given free-form natural language query. The weakly supervised setting, where only …
the given free-form natural language query. The weakly supervised setting, where only …
Weakly supervised temporal sentence grounding with uncertainty-guided self-training
The task of weakly supervised temporal sentence grounding aims at finding the
corresponding temporal moments of a language description in the video, given video …
corresponding temporal moments of a language description in the video, given video …
Cascaded prediction network via segment tree for temporal video grounding
Temporal video grounding aims to localize the target segment which is semantically aligned
with the given sentence in an untrimmed video. Existing methods can be divided into two …
with the given sentence in an untrimmed video. Existing methods can be divided into two …
Cross-sentence temporal and semantic relations in video activity localisation
Video activity localisation has recently attained increasing attention due to its practical
values in automatically localising the most salient visual segments corresponding to their …
values in automatically localising the most salient visual segments corresponding to their …
Unsupervised temporal video grounding with deep semantic clustering
Temporal video grounding (TVG) aims to localize a target segment in a video according to a
given sentence query. Though respectable works have made decent achievements in this …
given sentence query. Though respectable works have made decent achievements in this …