Multi-stage aggregated transformer network for temporal language localization in videos

M Liu, L Nie, Y Wang, M Wang, Y Rui - ACM Computing Surveys, 2023 - dl.acm.org

Video moment localization, also known as video moment retrieval, aims to search a target
segment within a video described by a given natural language query. Beyond the task of …

被引用次数：31 相关文章所有 4 个版本

[PDF] neurips.cc

Momentdiff: Generative video moment retrieval from random to real

P Li, CW Xie, H Xie, L Zhao, L Zhang… - Advances in neural …, 2024 - proceedings.neurips.cc

Video moment retrieval pursues an efficient and generalized solution to identify the specific
temporal segments within an untrimmed video that correspond to a given language …

被引用次数：60 相关文章所有 6 个版本

[PDF] arxiv.org

Temporal sentence grounding in videos: A survey and future directions

H Zhang, A Sun, W Jing, JT Zhou - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …

被引用次数：49 相关文章所有 8 个版本

[PDF] arxiv.org

Video transformers: A survey

J Selva, AS Johansen, S Escalera… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org

Transformer models have shown great success handling long-range interactions, making
them a promising tool for modeling video. However, they lack inductive biases and scale …

被引用次数：131 相关文章所有 8 个版本

[PDF] thecvf.com

You can ground earlier than see: An effective and efficient pipeline for temporal sentence grounding in compressed videos

X Fang, D Liu, P Zhou, G Nan - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Given an untrimmed video, temporal sentence grounding (TSG) aims to locate a target
moment semantically according to a sentence query. Although previous respectable works …

被引用次数：44 相关文章所有 7 个版本

[PDF] thecvf.com

G2l: Semantically aligned and uniform video grounding via geodesic and game theory

H Li, M Cao, X Cheng, Y Li, Z Zhu… - Proceedings of the …, 2023 - openaccess.thecvf.com

The recent video grounding works attempt to introduce vanilla contrastive learning into video
grounding. However, we claim that this naive solution is suboptimal. Contrastive learning …

被引用次数：41 相关文章所有 6 个版本

[PDF] arxiv.org

Local-global context aware transformer for language-guided video segmentation

C Liang, W Wang, T Zhou, J Miao… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

We explore the task of language-guided video segmentation (LVS). Previous algorithms
mostly adopt 3D CNNs to learn video representation, struggling to capture long-term context …

被引用次数：84 相关文章所有 9 个版本

[PDF] aaai.org

Weakly supervised video moment localization with contrastive negative sample mining

M Zheng, Y Huang, Q Chen, Y Liu - … of the AAAI Conference on Artificial …, 2022 - ojs.aaai.org

Video moment localization aims at localizing the video segments which are most related to
the given free-form natural language query. The weakly supervised setting, where only …

被引用次数：74 相关文章所有 7 个版本

[PDF] arxiv.org

ViGT: proposal-free video grounding with a learnable token in the transformer

K Li, D Guo, M Wang - Science China Information Sciences, 2023 - Springer

The video grounding (VG) task aims to locate the queried action or event in an untrimmed
video based on rich linguistic descriptions. Existing proposal-free methods are trapped in …

被引用次数：31 相关文章所有 3 个版本

[PDF] thecvf.com

Joint video summarization and moment localization by cross-task sample transfer

H Jiang, Y Mu - Proceedings of the IEEE/CVF Conference …, 2022 - openaccess.thecvf.com

Video summarization has recently engaged increasing attention in computer vision
communities. However, the scarcity of annotated data has been a key obstacle in this task …

被引用次数：45 相关文章所有 3 个版本