Temporal sentence grounding in videos: A survey and future directions
Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …
Are binary annotations sufficient? video moment retrieval via hierarchical uncertainty-based active learning
Recent research on video moment retrieval has mostly focused on enhancing the
performance of accuracy, efficiency, and robustness, all of which largely rely on the …
performance of accuracy, efficiency, and robustness, all of which largely rely on the …
Partial annotation-based video moment retrieval via iterative learning
Given a descriptive language query, Video Moment Retrieval (VMR) aims to seek the
corresponding semantic-consistent moment clip in the video, which is represented as a pair …
corresponding semantic-consistent moment clip in the video, which is represented as a pair …
Skimming, locating, then perusing: A human-like framework for natural language video localization
This paper addresses the problem of natural language video localization (NLVL). Almost all
existing works follow the" only look once" framework that exploits a single model to directly …
existing works follow the" only look once" framework that exploits a single model to directly …
Towards balanced alignment: Modal-enhanced semantic modeling for video moment retrieval
Video Moment Retrieval (VMR) aims to retrieve temporal segments in untrimmed videos
corresponding to a given language query by constructing cross-modal alignment strategies …
corresponding to a given language query by constructing cross-modal alignment strategies …
Rethinking the video sampling and reasoning strategies for temporal sentence grounding
Temporal sentence grounding (TSG) aims to identify the temporal boundary of a specific
segment from an untrimmed video by a sentence query. All existing works first utilize a …
segment from an untrimmed video by a sentence query. All existing works first utilize a …
[PDF][PDF] The elements of temporal sentence grounding in videos: A survey and future directions
Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …
Efficient temporal sentence grounding in videos with multi-teacher knowledge distillation
Temporal Sentence Grounding in Videos (TSGV) aims to detect the event timestamps
described by the natural language query from untrimmed videos. This paper discusses the …
described by the natural language query from untrimmed videos. This paper discusses the …
Entity-aware and motion-aware transformers for language-driven action localization in videos
Language-driven action localization in videos is a challenging task that involves not only
visual-linguistic matching but also action boundary prediction. Recent progress has been …
visual-linguistic matching but also action boundary prediction. Recent progress has been …
Mrtnet: Multi-resolution temporal network for video sentence grounding
Video sentence grounding locates a specific moment in a video based on a text query.
Existing methods focus on single temporal resolution, ignoring multi-scale temporal …
Existing methods focus on single temporal resolution, ignoring multi-scale temporal …