Temporal sentence grounding in videos: A survey and future directions
Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …
Multi-modal cross-domain alignment network for video moment retrieval
As an increasingly popular task in multimedia information retrieval, video moment retrieval
(VMR) aims to localize the target moment from an untrimmed video according to a given …
(VMR) aims to localize the target moment from an untrimmed video according to a given …
MS2GAH: Multi-label semantic supervised graph attention hashing for robust cross-modal retrieval
Due to the strong nonlinear representation capabilities of deep neural networks and the low
storage and high efficiency characteristics of hash learning, deep cross-modal hashing has …
storage and high efficiency characteristics of hash learning, deep cross-modal hashing has …
Video moment retrieval via comprehensive relation-aware network
Video moment retrieval aims to retrieve a target moment from an untrimmed video that
semantically corresponds to the given language query. Existing methods commonly treat it …
semantically corresponds to the given language query. Existing methods commonly treat it …
Toward video anomaly retrieval from video anomaly detection: New benchmarks and model
Video anomaly detection (VAD) has been paid increasing attention due to its potential
applications, its current dominant tasks focus on online detecting anomalies, which can be …
applications, its current dominant tasks focus on online detecting anomalies, which can be …
Hierarchical local-global transformer for temporal sentence grounding
This article studies the multimedia problem of temporal sentence grounding (TSG), which
aims to accurately determine the specific video segment in an untrimmed video according to …
aims to accurately determine the specific video segment in an untrimmed video according to …
Dhhn: Dual hierarchical hybrid network for weakly-supervised audio-visual video parsing
The Weakly-Supervised Audio-Visual Video Parsing (AVVP) task aims to parse a video into
temporal segments and predict their event categories in terms of modalities, labeling them …
temporal segments and predict their event categories in terms of modalities, labeling them …
Zero-shot video moment retrieval with angular reconstructive text embeddings
Given an untrimmed video and a text query, Video Moment Retrieval (VMR) aims at
retrieving a specific moment where the video content is semantically related to the text …
retrieving a specific moment where the video content is semantically related to the text …
Joint searching and grounding: Multi-granularity video content retrieval
Text-based video retrieval is a well-studied task aimed at retrieving relevant videos from a
large collection in response to a given text query. Most existing TVR works assume that …
large collection in response to a given text query. Most existing TVR works assume that …
Multi-feature embedded learning SVM for cloud detection in remote sensing images
W Zhang, S Jin, L Zhou, X Xie, F Wang, L Jiang… - Computers and …, 2022 - Elsevier
To improve remote sensing image transmission efficiency, we propose a cloud detection
method using a multi-feature embedded learning support vector machine (SVM) to address …
method using a multi-feature embedded learning support vector machine (SVM) to address …