A survey on video moment localization

M Liu, L Nie, Y Wang, M Wang, Y Rui - ACM Computing Surveys, 2023 - dl.acm.org
Video moment localization, also known as video moment retrieval, aims to search a target
segment within a video described by a given natural language query. Beyond the task of …

Temporal sentence grounding in videos: A survey and future directions

H Zhang, A Sun, W Jing, JT Zhou - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …

G2l: Semantically aligned and uniform video grounding via geodesic and game theory

H Li, M Cao, X Cheng, Y Li, Z Zhu… - Proceedings of the …, 2023 - openaccess.thecvf.com
The recent video grounding works attempt to introduce vanilla contrastive learning into video
grounding. However, we claim that this naive solution is suboptimal. Contrastive learning …

Dual learning with dynamic knowledge distillation for partially relevant video retrieval

J Dong, M Zhang, Z Zhang, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Almost all previous text-to-video retrieval works assume that videos are pre-trimmed with
short durations. However, in practice, videos are generally untrimmed containing much …

Partially relevant video retrieval

J Dong, X Chen, M Zhang, X Yang, S Chen… - Proceedings of the 30th …, 2022 - dl.acm.org
Current methods for text-to-video retrieval (T2VR) are trained and tested on video-captioning
oriented datasets such as MSVD, MSR-VTT and VATEX. A key property of these datasets is …

A survey on temporal sentence grounding in videos

X Lan, Y Yuan, X Wang, Z Wang, W Zhu - ACM Transactions on …, 2023 - dl.acm.org
Temporal sentence grounding in videos (TSGV), which aims at localizing one target
segment from an untrimmed video with respect to a given sentence query, has drawn …

You need to read again: Multi-granularity perception network for moment retrieval in videos

X Sun, X Wang, J Gao, Q Liu, X Zhou - Proceedings of the 45th …, 2022 - dl.acm.org
Moment retrieval in videos is a challenging task that aims to retrieve the most relevant video
moment in an untrimmed video given a sentence description. Previous methods tend to …

Uncovering main causalities for long-tailed information extraction

G Nan, J Zeng, R Qiao, Z Guo, W Lu - arXiv preprint arXiv:2109.05213, 2021 - arxiv.org
Information Extraction (IE) aims to extract structural information from unstructured texts. In
practice, long-tailed distributions caused by the selection bias of a dataset, may lead to …

Cross-modal retrieval with partially mismatched pairs

P Hu, Z Huang, D Peng, X Wang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
In this paper, we study a challenging but less-touched problem in cross-modal retrieval, ie,
partially mismatched pairs (PMPs). Specifically, in real-world scenarios, a huge number of …

Using multimodal contrastive knowledge distillation for video-text retrieval

W Ma, Q Chen, T Zhou, S Zhao… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Cross-modal retrieval aims to enable a flexible bi-directional retrieval experience across
different modalities (eg, searching for videos with texts). Many existing efforts tend to learn a …