Cross interaction network for natural language guided video moment retrieval

H Zhang, A Sun, W Jing, JT Zhou - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …

被引用次数：49 相关文章所有 8 个版本

[PDF] acm.org

A survey on temporal sentence grounding in videos

X Lan, Y Yuan, X Wang, Z Wang, W Zhu - ACM Transactions on …, 2023 - dl.acm.org

Temporal sentence grounding in videos (TSGV), which aims at localizing one target
segment from an untrimmed video with respect to a given sentence query, has drawn …

被引用次数：59 相关文章所有 8 个版本

[PDF] openreview.net

Not all inputs are valid: Towards open-set video moment retrieval using language

X Fang, W Fang, D Liu, X Qu, J Dong, P Zhou… - Proceedings of the …, 2024 - dl.acm.org

Video Moment Retrieval (VMR) targets to retrieve the specific moment corresponding to a
sentence query from an untrimmed video. Although recent respectable works have made …

被引用次数：5 相关文章所有 2 个版本

Prompt-based zero-shot video moment retrieval

G Wang, X Wu, Z Liu, J Yan - Proceedings of the 30th ACM International …, 2022 - dl.acm.org

Video moment retrieval aims at localizing a specific moment from an untrimmed video by a
sentence query. Most methods rely on heavy annotations of video moment-query pairs …

被引用次数：27 相关文章

Sdn: Semantic decoupling network for temporal language grounding

X Jiang, X Xu, J Zhang, F Shen, Z Cao… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Temporal language grounding (TLG) is one of the most challenging cross-modal video
understanding tasks, which aims at retrieving the most relevant video segment from an …

被引用次数：24 相关文章所有 3 个版本

[PDF] researchgate.net

[PDF][PDF] The elements of temporal sentence grounding in videos: A survey and future directions

H Zhang, A Sun, W Jing, JT Zhou - arXiv preprint arXiv …, 2022 - researchgate.net

Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …

被引用次数：24 相关文章

[PDF] arxiv.org

Mrtnet: Multi-resolution temporal network for video sentence grounding

W Ji, Y Qin, L Chen, Y Wei, Y Wu… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

Video sentence grounding locates a specific moment in a video based on a text query.
Existing methods focus on single temporal resolution, ignoring multi-scale temporal …

被引用次数：18 相关文章所有 3 个版本

Video moment retrieval with hierarchical contrastive learning

B Zhang, C Yang, B Jiang, X Zhou - Proceedings of the 30th ACM …, 2022 - dl.acm.org

This paper explores the task of video moment retrieval (VMR), which aims to localize the
temporal boundary of a specific moment from an untrimmed video by a sentence query …

被引用次数：15 相关文章

[PDF] sunderland.ac.uk

DPHANet: Discriminative Parallel and Hierarchical Attention Network for Natural Language Video Localization

R Chen, J Tan, Z Yang, X Yang, Q Dai… - IEEE Transactions …, 2024 - ieeexplore.ieee.org

Natural Language Video Localization (NLVL) has recently attracted much attention because
of its practical significance. However, the existing methods still face the following challenges …

被引用次数：2 相关文章

Unsupervised Video Moment Retrieval with Knowledge-Based Pseudo-Supervision Construction

G Wang, X Wu, X Tu, Z Liu, J Yan - ACM Transactions on Information …, 2024 - dl.acm.org

Video moment retrieval locates a specified moment by a sentence query. Recent
approaches have made remarkable advancements with large-scale video-sentence …

被引用次数：1 相关文章