A survey of traditional and deep learning-based feature descriptors for high dimensional data in computer vision

T Georgiou, Y Liu, W Chen, M Lew - International Journal of Multimedia …, 2020 - Springer
Higher dimensional data such as video and 3D are the leading edge of multimedia retrieval
and computer vision research. In this survey, we give a comprehensive overview and key …

A survey on video moment localization

M Liu, L Nie, Y Wang, M Wang, Y Rui - ACM Computing Surveys, 2023 - dl.acm.org
Video moment localization, also known as video moment retrieval, aims to search a target
segment within a video described by a given natural language query. Beyond the task of …

Momentdiff: Generative video moment retrieval from random to real

P Li, CW Xie, H Xie, L Zhao, L Zhang… - Advances in neural …, 2024 - proceedings.neurips.cc
Video moment retrieval pursues an efficient and generalized solution to identify the specific
temporal segments within an untrimmed video that correspond to a given language …

Detecting moments and highlights in videos via natural language queries

J Lei, TL Berg, M Bansal - Advances in Neural Information …, 2021 - proceedings.neurips.cc
Detecting customized moments and highlights from videos given natural language (NL) user
queries is an important but under-studied topic. One of the challenges in pursuing this …

Learning 2d temporal adjacent networks for moment localization with natural language

S Zhang, H Peng, J Fu, J Luo - Proceedings of the AAAI Conference on …, 2020 - ojs.aaai.org
We address the problem of retrieving a specific moment from an untrimmed video by a query
sentence. This is a challenging problem because a target moment may take place in …

Image retrieval on real-life images with pre-trained vision-and-language models

Z Liu, C Rodriguez-Opazo… - Proceedings of the …, 2021 - openaccess.thecvf.com
We extend the task of composed image retrieval, where an input query consists of an image
and short textual description of how to modify the image. Existing methods have only been …

3djcg: A unified framework for joint dense captioning and visual grounding on 3d point clouds

D Cai, L Zhao, J Zhang, L Sheng… - Proceedings of the …, 2022 - openaccess.thecvf.com
Observing that the 3D captioning task and the 3D grounding task contain both shared and
complementary information in nature, in this work, we propose a unified framework to jointly …

Interventional video grounding with dual contrastive learning

G Nan, R Qiao, Y Xiao, J Liu, S Leng… - Proceedings of the …, 2021 - openaccess.thecvf.com
Video grounding aims to localize a moment from an untrimmed video for a given textual
query. Existing approaches focus more on the alignment of visual and language stimuli with …

Dense regression network for video grounding

R Zeng, H Xu, W Huang, P Chen… - Proceedings of the …, 2020 - openaccess.thecvf.com
We address the problem of video grounding from natural language queries. The key
challenge in this task is that one training video might only contain a few annotated …

Span-based localizing network for natural language video localization

H Zhang, A Sun, W Jing, JT Zhou - arXiv preprint arXiv:2004.13931, 2020 - arxiv.org
Given an untrimmed video and a text query, natural language video localization (NLVL) is to
locate a matching span from the video that semantically corresponds to the query. Existing …