Temporal sentence grounding in videos: A survey and future directions

H Zhang, A Sun, W Jing, JT Zhou - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …

Knowing where to focus: Event-aware transformer for video grounding

J Jang, J Park, J Kim, H Kwon… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Recent DETR-based video grounding models have made the model directly predict moment
timestamps without any hand-crafted components, such as a pre-defined proposal or non …

You can ground earlier than see: An effective and efficient pipeline for temporal sentence grounding in compressed videos

X Fang, D Liu, P Zhou, G Nan - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Given an untrimmed video, temporal sentence grounding (TSG) aims to locate a target
moment semantically according to a sentence query. Although previous respectable works …

Towards generalisable video moment retrieval: Visual-dynamic injection to image-text pre-training

D Luo, J Huang, S Gong, H Jin… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
The correlation between the vision and text is essential for video moment retrieval (VMR),
however, existing methods heavily rely on separate pre-training feature extractors for visual …

Dual learning with dynamic knowledge distillation for partially relevant video retrieval

J Dong, M Zhang, Z Zhang, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Almost all previous text-to-video retrieval works assume that videos are pre-trimmed with
short durations. However, in practice, videos are generally untrimmed containing much …

Skimming, locating, then perusing: A human-like framework for natural language video localization

D Liu, W Hu - Proceedings of the 30th ACM International Conference …, 2022 - dl.acm.org
This paper addresses the problem of natural language video localization (NLVL). Almost all
existing works follow the" only look once" framework that exploits a single model to directly …

Partial annotation-based video moment retrieval via iterative learning

W Ji, R Liang, L Liao, H Fei, F Feng - Proceedings of the 31st ACM …, 2023 - dl.acm.org
Given a descriptive language query, Video Moment Retrieval (VMR) aims to seek the
corresponding semantic-consistent moment clip in the video, which is represented as a pair …

Hypotheses tree building for one-shot temporal sentence localization

D Liu, X Fang, P Zhou, X Di, W Lu… - Proceedings of the AAAI …, 2023 - ojs.aaai.org
Given an untrimmed video, temporal sentence localization (TSL) aims to localize a specific
segment according to a given sentence query. Though respectable works have made …

Few-shot temporal sentence grounding via memory-guided semantic learning

D Liu, P Zhou, Z Xu, H Wang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Temporal sentence grounding (TSG) is an important yet challenging task in video-based
information retrieval. Given an untrimmed video input, it requires the machine to predict the …

Unsupervised domain adaptation for video object grounding with cascaded debiasing learning

M Li, H Zhang, J Li, Z Zhao, W Zhang, S Zhang… - Proceedings of the 31st …, 2023 - dl.acm.org
This paper addresses the Unsupervised Domain Adaptation (UDA) for the dense frame
prediction task-Video Object Grounding (VOG). This investigation springs from the …