A survey on video moment localization
Video moment localization, also known as video moment retrieval, aims to search a target
segment within a video described by a given natural language query. Beyond the task of …
segment within a video described by a given natural language query. Beyond the task of …
Temporal sentence grounding in videos: A survey and future directions
Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …
G2l: Semantically aligned and uniform video grounding via geodesic and game theory
H Li, M Cao, X Cheng, Y Li, Z Zhu… - Proceedings of the …, 2023 - openaccess.thecvf.com
The recent video grounding works attempt to introduce vanilla contrastive learning into video
grounding. However, we claim that this naive solution is suboptimal. Contrastive learning …
grounding. However, we claim that this naive solution is suboptimal. Contrastive learning …
Dual learning with dynamic knowledge distillation for partially relevant video retrieval
J Dong, M Zhang, Z Zhang, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Almost all previous text-to-video retrieval works assume that videos are pre-trimmed with
short durations. However, in practice, videos are generally untrimmed containing much …
short durations. However, in practice, videos are generally untrimmed containing much …
Partially relevant video retrieval
Current methods for text-to-video retrieval (T2VR) are trained and tested on video-captioning
oriented datasets such as MSVD, MSR-VTT and VATEX. A key property of these datasets is …
oriented datasets such as MSVD, MSR-VTT and VATEX. A key property of these datasets is …
A survey on temporal sentence grounding in videos
Temporal sentence grounding in videos (TSGV), which aims at localizing one target
segment from an untrimmed video with respect to a given sentence query, has drawn …
segment from an untrimmed video with respect to a given sentence query, has drawn …
You need to read again: Multi-granularity perception network for moment retrieval in videos
Moment retrieval in videos is a challenging task that aims to retrieve the most relevant video
moment in an untrimmed video given a sentence description. Previous methods tend to …
moment in an untrimmed video given a sentence description. Previous methods tend to …
Uncovering main causalities for long-tailed information extraction
Information Extraction (IE) aims to extract structural information from unstructured texts. In
practice, long-tailed distributions caused by the selection bias of a dataset, may lead to …
practice, long-tailed distributions caused by the selection bias of a dataset, may lead to …
Cross-modal retrieval with partially mismatched pairs
In this paper, we study a challenging but less-touched problem in cross-modal retrieval, ie,
partially mismatched pairs (PMPs). Specifically, in real-world scenarios, a huge number of …
partially mismatched pairs (PMPs). Specifically, in real-world scenarios, a huge number of …
Using multimodal contrastive knowledge distillation for video-text retrieval
Cross-modal retrieval aims to enable a flexible bi-directional retrieval experience across
different modalities (eg, searching for videos with texts). Many existing efforts tend to learn a …
different modalities (eg, searching for videos with texts). Many existing efforts tend to learn a …