Parallel attention network with sequence matching for video grounding

H Zhang, A Sun, W Jing, JT Zhou - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …

被引用次数：47 相关文章所有 8 个版本

[PDF] thecvf.com

Are binary annotations sufficient? video moment retrieval via hierarchical uncertainty-based active learning

W Ji, R Liang, Z Zheng, W Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Recent research on video moment retrieval has mostly focused on enhancing the
performance of accuracy, efficiency, and robustness, all of which largely rely on the …

被引用次数：33 相关文章所有 7 个版本

[PDF] acm.org

Partial annotation-based video moment retrieval via iterative learning

W Ji, R Liang, L Liao, H Fei, F Feng - Proceedings of the 31st ACM …, 2023 - dl.acm.org

Given a descriptive language query, Video Moment Retrieval (VMR) aims to seek the
corresponding semantic-consistent moment clip in the video, which is represented as a pair …

被引用次数：16 相关文章所有 4 个版本

[PDF] arxiv.org

Skimming, locating, then perusing: A human-like framework for natural language video localization

D Liu, W Hu - Proceedings of the 30th ACM International Conference …, 2022 - dl.acm.org

This paper addresses the problem of natural language video localization (NLVL). Almost all
existing works follow the" only look once" framework that exploits a single model to directly …

被引用次数：38 相关文章所有 3 个版本

[PDF] aaai.org

Towards balanced alignment: Modal-enhanced semantic modeling for video moment retrieval

Z Liu, J Li, H Xie, P Li, J Ge, SA Liu, G Jin - Proceedings of the AAAI …, 2024 - ojs.aaai.org

Video Moment Retrieval (VMR) aims to retrieve temporal segments in untrimmed videos
corresponding to a given language query by constructing cross-modal alignment strategies …

被引用次数：15 相关文章所有 3 个版本

[PDF] arxiv.org

Rethinking the video sampling and reasoning strategies for temporal sentence grounding

J Zhu, D Liu, P Zhou, X Di, Y Cheng, S Yang… - arXiv preprint arXiv …, 2023 - arxiv.org

Temporal sentence grounding (TSG) aims to identify the temporal boundary of a specific
segment from an untrimmed video by a sentence query. All existing works first utilize a …

被引用次数：18 相关文章所有 3 个版本

[PDF] researchgate.net

[PDF][PDF] The elements of temporal sentence grounding in videos: A survey and future directions

H Zhang, A Sun, W Jing, JT Zhou - arXiv preprint arXiv …, 2022 - researchgate.net

Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …

被引用次数：24 相关文章

[PDF] arxiv.org

Efficient temporal sentence grounding in videos with multi-teacher knowledge distillation

R Liang, Y Yang, H Lu, L Li - arXiv preprint arXiv:2308.03725, 2023 - arxiv.org

Temporal Sentence Grounding in Videos (TSGV) aims to detect the event timestamps
described by the natural language query from untrimmed videos. This paper discusses the …

被引用次数：10 相关文章所有 2 个版本

[PDF] arxiv.org

Entity-aware and motion-aware transformers for language-driven action localization in videos

S Yang, X Wu - arXiv preprint arXiv:2205.05854, 2022 - arxiv.org

Language-driven action localization in videos is a challenging task that involves not only
visual-linguistic matching but also action boundary prediction. Recent progress has been …

被引用次数：17 相关文章所有 4 个版本

[PDF] arxiv.org

Mrtnet: Multi-resolution temporal network for video sentence grounding

W Ji, Y Qin, L Chen, Y Wei, Y Wu… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

Video sentence grounding locates a specific moment in a video based on a text query.
Existing methods focus on single temporal resolution, ignoring multi-scale temporal …

被引用次数：17 相关文章所有 3 个版本