Temporal sentence grounding in videos: A survey and future directions

H Zhang, A Sun, W Jing, JT Zhou - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …

Are binary annotations sufficient? video moment retrieval via hierarchical uncertainty-based active learning

W Ji, R Liang, Z Zheng, W Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent research on video moment retrieval has mostly focused on enhancing the
performance of accuracy, efficiency, and robustness, all of which largely rely on the …

Partial annotation-based video moment retrieval via iterative learning

W Ji, R Liang, L Liao, H Fei, F Feng - Proceedings of the 31st ACM …, 2023 - dl.acm.org
Given a descriptive language query, Video Moment Retrieval (VMR) aims to seek the
corresponding semantic-consistent moment clip in the video, which is represented as a pair …

Skimming, locating, then perusing: A human-like framework for natural language video localization

D Liu, W Hu - Proceedings of the 30th ACM International Conference …, 2022 - dl.acm.org
This paper addresses the problem of natural language video localization (NLVL). Almost all
existing works follow the" only look once" framework that exploits a single model to directly …

Towards balanced alignment: Modal-enhanced semantic modeling for video moment retrieval

Z Liu, J Li, H Xie, P Li, J Ge, SA Liu, G Jin - Proceedings of the AAAI …, 2024 - ojs.aaai.org
Video Moment Retrieval (VMR) aims to retrieve temporal segments in untrimmed videos
corresponding to a given language query by constructing cross-modal alignment strategies …

Rethinking the video sampling and reasoning strategies for temporal sentence grounding

J Zhu, D Liu, P Zhou, X Di, Y Cheng, S Yang… - arXiv preprint arXiv …, 2023 - arxiv.org
Temporal sentence grounding (TSG) aims to identify the temporal boundary of a specific
segment from an untrimmed video by a sentence query. All existing works first utilize a …

[PDF][PDF] The elements of temporal sentence grounding in videos: A survey and future directions

H Zhang, A Sun, W Jing, JT Zhou - arXiv preprint arXiv …, 2022 - researchgate.net
Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …

Efficient temporal sentence grounding in videos with multi-teacher knowledge distillation

R Liang, Y Yang, H Lu, L Li - arXiv preprint arXiv:2308.03725, 2023 - arxiv.org
Temporal Sentence Grounding in Videos (TSGV) aims to detect the event timestamps
described by the natural language query from untrimmed videos. This paper discusses the …

Entity-aware and motion-aware transformers for language-driven action localization in videos

S Yang, X Wu - arXiv preprint arXiv:2205.05854, 2022 - arxiv.org
Language-driven action localization in videos is a challenging task that involves not only
visual-linguistic matching but also action boundary prediction. Recent progress has been …

Mrtnet: Multi-resolution temporal network for video sentence grounding

W Ji, Y Qin, L Chen, Y Wei, Y Wu… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Video sentence grounding locates a specific moment in a video based on a text query.
Existing methods focus on single temporal resolution, ignoring multi-scale temporal …