Video corpus moment retrieval with contrastive learning

M Liu, L Nie, Y Wang, M Wang, Y Rui - ACM Computing Surveys, 2023 - dl.acm.org

Video moment localization, also known as video moment retrieval, aims to search a target
segment within a video described by a given natural language query. Beyond the task of …

被引用次数：30 相关文章所有 4 个版本

[PDF] arxiv.org

Temporal sentence grounding in videos: A survey and future directions

H Zhang, A Sun, W Jing, JT Zhou - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …

被引用次数：45 相关文章所有 8 个版本

[PDF] thecvf.com

G2l: Semantically aligned and uniform video grounding via geodesic and game theory

H Li, M Cao, X Cheng, Y Li, Z Zhu… - Proceedings of the …, 2023 - openaccess.thecvf.com

The recent video grounding works attempt to introduce vanilla contrastive learning into video
grounding. However, we claim that this naive solution is suboptimal. Contrastive learning …

被引用次数：39 相关文章所有 6 个版本

[PDF] thecvf.com

Dual learning with dynamic knowledge distillation for partially relevant video retrieval

J Dong, M Zhang, Z Zhang, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

Almost all previous text-to-video retrieval works assume that videos are pre-trimmed with
short durations. However, in practice, videos are generally untrimmed containing much …

被引用次数：11 相关文章所有 4 个版本

[PDF] arxiv.org

Partially relevant video retrieval

J Dong, X Chen, M Zhang, X Yang, S Chen… - Proceedings of the 30th …, 2022 - dl.acm.org

Current methods for text-to-video retrieval (T2VR) are trained and tested on video-captioning
oriented datasets such as MSVD, MSR-VTT and VATEX. A key property of these datasets is …

被引用次数：46 相关文章所有 3 个版本

[PDF] acm.org Full View

A survey on temporal sentence grounding in videos

X Lan, Y Yuan, X Wang, Z Wang, W Zhu - ACM Transactions on …, 2023 - dl.acm.org

Temporal sentence grounding in videos (TSGV), which aims at localizing one target
segment from an untrimmed video with respect to a given sentence query, has drawn …

被引用次数：58 相关文章所有 8 个版本

[PDF] arxiv.org

You need to read again: Multi-granularity perception network for moment retrieval in videos

X Sun, X Wang, J Gao, Q Liu, X Zhou - Proceedings of the 45th …, 2022 - dl.acm.org

Moment retrieval in videos is a challenging task that aims to retrieve the most relevant video
moment in an untrimmed video given a sentence description. Previous methods tend to …

被引用次数：36 相关文章所有 3 个版本

[PDF] arxiv.org

Uncovering main causalities for long-tailed information extraction

G Nan, J Zeng, R Qiao, Z Guo, W Lu - arXiv preprint arXiv:2109.05213, 2021 - arxiv.org

Information Extraction (IE) aims to extract structural information from unstructured texts. In
practice, long-tailed distributions caused by the selection bias of a dataset, may lead to …

被引用次数：48 相关文章所有 4 个版本

[PDF] pengxi.me

Cross-modal retrieval with partially mismatched pairs

P Hu, Z Huang, D Peng, X Wang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

In this paper, we study a challenging but less-touched problem in cross-modal retrieval, ie,
partially mismatched pairs (PMPs). Specifically, in real-world scenarios, a huge number of …

被引用次数：47 相关文章所有 6 个版本

Using multimodal contrastive knowledge distillation for video-text retrieval

W Ma, Q Chen, T Zhou, S Zhao… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Cross-modal retrieval aims to enable a flexible bi-directional retrieval experience across
different modalities (eg, searching for videos with texts). Many existing efforts tend to learn a …

被引用次数：24 相关文章所有 2 个版本