A survey on video moment localization
Video moment localization, also known as video moment retrieval, aims to search a target
segment within a video described by a given natural language query. Beyond the task of …
segment within a video described by a given natural language query. Beyond the task of …
Temporal sentence grounding in videos: A survey and future directions
Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …
Tubedetr: Spatio-temporal video grounding with transformers
We consider the problem of localizing a spatio-temporal tube in a video corresponding to a
given text query. This is a challenging task that requires the joint and efficient modeling of …
given text query. This is a challenging task that requires the joint and efficient modeling of …
Context-aware biaffine localizing network for temporal sentence grounding
This paper addresses the problem of temporal sentence grounding (TSG), which aims to
identify the temporal boundary of a specific segment from an untrimmed video by a sentence …
identify the temporal boundary of a specific segment from an untrimmed video by a sentence …
Local-global video-text interactions for temporal grounding
This paper addresses the problem of text-to-video temporal grounding, which aims to
identify the time interval in a video semantically relevant to a text query. We tackle this …
identify the time interval in a video semantically relevant to a text query. We tackle this …
Boundary proposal network for two-stage natural language video localization
We aim to address the problem of Natural Language Video Localization (NLVL)—localizing
the video segment corresponding to a natural language description in a long and untrimmed …
the video segment corresponding to a natural language description in a long and untrimmed …
Negative sample matters: A renaissance of metric learning for temporal grounding
Temporal grounding aims to localize a video moment which is semantically aligned with a
given natural language query. Existing methods typically apply a detection or regression …
given natural language query. Existing methods typically apply a detection or regression …
Mindstorms in natural language-based societies of mind
Both Minsky's" society of mind" and Schmidhuber's" learning to think" inspire diverse
societies of large multimodal neural networks (NNs) that solve problems by interviewing …
societies of large multimodal neural networks (NNs) that solve problems by interviewing …
Fast video moment retrieval
This paper targets at fast video moment retrieval (fast VMR), aiming to localize the target
moment efficiently and accurately as queried by a given natural language sentence. We …
moment efficiently and accurately as queried by a given natural language sentence. We …
Semantic conditioned dynamic modulation for temporal sentence grounding in videos
Temporal sentence grounding in videos aims to detect and localize one target video
segment, which semantically corresponds to a given sentence. Existing methods mainly …
segment, which semantically corresponds to a given sentence. Existing methods mainly …