- 学术资源搜索

A survey of traditional and deep learning-based feature descriptors for high dimensional data in computer vision

T Georgiou, Y Liu, W Chen, M Lew - International Journal of Multimedia …, 2020 - Springer

Higher dimensional data such as video and 3D are the leading edge of multimedia retrieval
and computer vision research. In this survey, we give a comprehensive overview and key …

被引用次数：161 相关文章所有 8 个版本

[PDF] arxiv.org

A survey on video moment localization

M Liu, L Nie, Y Wang, M Wang, Y Rui - ACM Computing Surveys, 2023 - dl.acm.org

Video moment localization, also known as video moment retrieval, aims to search a target
segment within a video described by a given natural language query. Beyond the task of …

被引用次数：30 相关文章所有 4 个版本

[PDF] neurips.cc

Momentdiff: Generative video moment retrieval from random to real

P Li, CW Xie, H Xie, L Zhao, L Zhang… - Advances in neural …, 2024 - proceedings.neurips.cc

Video moment retrieval pursues an efficient and generalized solution to identify the specific
temporal segments within an untrimmed video that correspond to a given language …

被引用次数：43 相关文章所有 6 个版本

[PDF] neurips.cc

Detecting moments and highlights in videos via natural language queries

J Lei, TL Berg, M Bansal - Advances in Neural Information …, 2021 - proceedings.neurips.cc

Detecting customized moments and highlights from videos given natural language (NL) user
queries is an important but under-studied topic. One of the challenges in pursuing this …

被引用次数：194 相关文章所有 7 个版本

[PDF] aaai.org

Learning 2d temporal adjacent networks for moment localization with natural language

S Zhang, H Peng, J Fu, J Luo - Proceedings of the AAAI Conference on …, 2020 - ojs.aaai.org

We address the problem of retrieving a specific moment from an untrimmed video by a query
sentence. This is a challenging problem because a target moment may take place in …

被引用次数：465 相关文章所有 7 个版本

[PDF] thecvf.com

Image retrieval on real-life images with pre-trained vision-and-language models

Z Liu, C Rodriguez-Opazo… - Proceedings of the …, 2021 - openaccess.thecvf.com

We extend the task of composed image retrieval, where an input query consists of an image
and short textual description of how to modify the image. Existing methods have only been …

被引用次数：166 相关文章所有 7 个版本

[PDF] thecvf.com

3djcg: A unified framework for joint dense captioning and visual grounding on 3d point clouds

D Cai, L Zhao, J Zhang, L Sheng… - Proceedings of the …, 2022 - openaccess.thecvf.com

Observing that the 3D captioning task and the 3D grounding task contain both shared and
complementary information in nature, in this work, we propose a unified framework to jointly …

被引用次数：91 相关文章所有 3 个版本

[PDF] thecvf.com

Interventional video grounding with dual contrastive learning

G Nan, R Qiao, Y Xiao, J Liu, S Leng… - Proceedings of the …, 2021 - openaccess.thecvf.com

Video grounding aims to localize a moment from an untrimmed video for a given textual
query. Existing approaches focus more on the alignment of visual and language stimuli with …

被引用次数：157 相关文章所有 6 个版本

[PDF] thecvf.com

Dense regression network for video grounding

R Zeng, H Xu, W Huang, P Chen… - Proceedings of the …, 2020 - openaccess.thecvf.com

We address the problem of video grounding from natural language queries. The key
challenge in this task is that one training video might only contain a few annotated …

被引用次数：297 相关文章所有 10 个版本

[PDF] arxiv.org

Span-based localizing network for natural language video localization

H Zhang, A Sun, W Jing, JT Zhou - arXiv preprint arXiv:2004.13931, 2020 - arxiv.org

Given an untrimmed video and a text query, natural language video localization (NLVL) is to
locate a matching span from the video that semantically corresponds to the query. Existing …

被引用次数：312 相关文章所有 6 个版本