A survey on video moment localization
Video moment localization, also known as video moment retrieval, aims to search a target
segment within a video described by a given natural language query. Beyond the task of …
segment within a video described by a given natural language query. Beyond the task of …
Momentdiff: Generative video moment retrieval from random to real
Video moment retrieval pursues an efficient and generalized solution to identify the specific
temporal segments within an untrimmed video that correspond to a given language …
temporal segments within an untrimmed video that correspond to a given language …
Temporal sentence grounding in videos: A survey and future directions
Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …
Video transformers: A survey
Transformer models have shown great success handling long-range interactions, making
them a promising tool for modeling video. However, they lack inductive biases and scale …
them a promising tool for modeling video. However, they lack inductive biases and scale …
You can ground earlier than see: An effective and efficient pipeline for temporal sentence grounding in compressed videos
Given an untrimmed video, temporal sentence grounding (TSG) aims to locate a target
moment semantically according to a sentence query. Although previous respectable works …
moment semantically according to a sentence query. Although previous respectable works …
G2l: Semantically aligned and uniform video grounding via geodesic and game theory
The recent video grounding works attempt to introduce vanilla contrastive learning into video
grounding. However, we claim that this naive solution is suboptimal. Contrastive learning …
grounding. However, we claim that this naive solution is suboptimal. Contrastive learning …
Local-global context aware transformer for language-guided video segmentation
We explore the task of language-guided video segmentation (LVS). Previous algorithms
mostly adopt 3D CNNs to learn video representation, struggling to capture long-term context …
mostly adopt 3D CNNs to learn video representation, struggling to capture long-term context …
Weakly supervised video moment localization with contrastive negative sample mining
Video moment localization aims at localizing the video segments which are most related to
the given free-form natural language query. The weakly supervised setting, where only …
the given free-form natural language query. The weakly supervised setting, where only …
ViGT: proposal-free video grounding with a learnable token in the transformer
The video grounding (VG) task aims to locate the queried action or event in an untrimmed
video based on rich linguistic descriptions. Existing proposal-free methods are trapped in …
video based on rich linguistic descriptions. Existing proposal-free methods are trapped in …
Joint video summarization and moment localization by cross-task sample transfer
Video summarization has recently engaged increasing attention in computer vision
communities. However, the scarcity of annotated data has been a key obstacle in this task …
communities. However, the scarcity of annotated data has been a key obstacle in this task …