Reducing the vision and language bias for temporal sentence grounding

H Zhang, A Sun, W Jing, JT Zhou - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …

被引用次数：35 相关文章所有 8 个版本

[PDF] thecvf.com

Knowing where to focus: Event-aware transformer for video grounding

J Jang, J Park, J Kim, H Kwon… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Recent DETR-based video grounding models have made the model directly predict moment
timestamps without any hand-crafted components, such as a pre-defined proposal or non …

被引用次数：29 相关文章所有 8 个版本

[PDF] thecvf.com

You can ground earlier than see: An effective and efficient pipeline for temporal sentence grounding in compressed videos

X Fang, D Liu, P Zhou, G Nan - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Given an untrimmed video, temporal sentence grounding (TSG) aims to locate a target
moment semantically according to a sentence query. Although previous respectable works …

被引用次数：29 相关文章所有 7 个版本

[PDF] thecvf.com

Towards generalisable video moment retrieval: Visual-dynamic injection to image-text pre-training

D Luo, J Huang, S Gong, H Jin… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

The correlation between the vision and text is essential for video moment retrieval (VMR),
however, existing methods heavily rely on separate pre-training feature extractors for visual …

被引用次数：23 相关文章所有 7 个版本

[PDF] thecvf.com

Dual learning with dynamic knowledge distillation for partially relevant video retrieval

J Dong, M Zhang, Z Zhang, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

Almost all previous text-to-video retrieval works assume that videos are pre-trimmed with
short durations. However, in practice, videos are generally untrimmed containing much …

被引用次数：7 相关文章所有 4 个版本

[PDF] arxiv.org

Skimming, locating, then perusing: A human-like framework for natural language video localization

D Liu, W Hu - Proceedings of the 30th ACM International Conference …, 2022 - dl.acm.org

This paper addresses the problem of natural language video localization (NLVL). Almost all
existing works follow the" only look once" framework that exploits a single model to directly …

被引用次数：34 相关文章所有 3 个版本

[PDF] acm.org

Partial annotation-based video moment retrieval via iterative learning

W Ji, R Liang, L Liao, H Fei, F Feng - Proceedings of the 31st ACM …, 2023 - dl.acm.org

Given a descriptive language query, Video Moment Retrieval (VMR) aims to seek the
corresponding semantic-consistent moment clip in the video, which is represented as a pair …

被引用次数：10 相关文章所有 4 个版本

[PDF] aaai.org

Hypotheses tree building for one-shot temporal sentence localization

D Liu, X Fang, P Zhou, X Di, W Lu… - Proceedings of the AAAI …, 2023 - ojs.aaai.org

Given an untrimmed video, temporal sentence localization (TSL) aims to localize a specific
segment according to a given sentence query. Though respectable works have made …

被引用次数：12 相关文章所有 6 个版本

Few-shot temporal sentence grounding via memory-guided semantic learning

D Liu, P Zhou, Z Xu, H Wang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Temporal sentence grounding (TSG) is an important yet challenging task in video-based
information retrieval. Given an untrimmed video input, it requires the machine to predict the …

被引用次数：16 相关文章所有 2 个版本

Unsupervised domain adaptation for video object grounding with cascaded debiasing learning

M Li, H Zhang, J Li, Z Zhao, W Zhang, S Zhang… - Proceedings of the 31st …, 2023 - dl.acm.org

This paper addresses the Unsupervised Domain Adaptation (UDA) for the dense frame
prediction task-Video Object Grounding (VOG). This investigation springs from the …

被引用次数：6 相关文章