A survey on video moment localization

M Liu, L Nie, Y Wang, M Wang, Y Rui - ACM Computing Surveys, 2023 - dl.acm.org
Video moment localization, also known as video moment retrieval, aims to search a target
segment within a video described by a given natural language query. Beyond the task of …

Momentdiff: Generative video moment retrieval from random to real

P Li, CW Xie, H Xie, L Zhao, L Zhang… - Advances in neural …, 2024 - proceedings.neurips.cc
Video moment retrieval pursues an efficient and generalized solution to identify the specific
temporal segments within an untrimmed video that correspond to a given language …

Temporal sentence grounding in videos: A survey and future directions

H Zhang, A Sun, W Jing, JT Zhou - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …

Video transformers: A survey

J Selva, AS Johansen, S Escalera… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Transformer models have shown great success handling long-range interactions, making
them a promising tool for modeling video. However, they lack inductive biases and scale …

You can ground earlier than see: An effective and efficient pipeline for temporal sentence grounding in compressed videos

X Fang, D Liu, P Zhou, G Nan - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Given an untrimmed video, temporal sentence grounding (TSG) aims to locate a target
moment semantically according to a sentence query. Although previous respectable works …

G2l: Semantically aligned and uniform video grounding via geodesic and game theory

H Li, M Cao, X Cheng, Y Li, Z Zhu… - Proceedings of the …, 2023 - openaccess.thecvf.com
The recent video grounding works attempt to introduce vanilla contrastive learning into video
grounding. However, we claim that this naive solution is suboptimal. Contrastive learning …

Local-global context aware transformer for language-guided video segmentation

C Liang, W Wang, T Zhou, J Miao… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
We explore the task of language-guided video segmentation (LVS). Previous algorithms
mostly adopt 3D CNNs to learn video representation, struggling to capture long-term context …

Weakly supervised video moment localization with contrastive negative sample mining

M Zheng, Y Huang, Q Chen, Y Liu - … of the AAAI Conference on Artificial …, 2022 - ojs.aaai.org
Video moment localization aims at localizing the video segments which are most related to
the given free-form natural language query. The weakly supervised setting, where only …

ViGT: proposal-free video grounding with a learnable token in the transformer

K Li, D Guo, M Wang - Science China Information Sciences, 2023 - Springer
The video grounding (VG) task aims to locate the queried action or event in an untrimmed
video based on rich linguistic descriptions. Existing proposal-free methods are trapped in …

Joint video summarization and moment localization by cross-task sample transfer

H Jiang, Y Mu - Proceedings of the IEEE/CVF Conference …, 2022 - openaccess.thecvf.com
Video summarization has recently engaged increasing attention in computer vision
communities. However, the scarcity of annotated data has been a key obstacle in this task …