Temporal sentence grounding in videos: A survey and future directions

H Zhang, A Sun, W Jing, JT Zhou - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …

Multi-modal cross-domain alignment network for video moment retrieval

X Fang, D Liu, P Zhou, Y Hu - IEEE Transactions on Multimedia, 2022 - ieeexplore.ieee.org
As an increasingly popular task in multimedia information retrieval, video moment retrieval
(VMR) aims to localize the target moment from an untrimmed video according to a given …

MS2GAH: Multi-label semantic supervised graph attention hashing for robust cross-modal retrieval

Y Duan, N Chen, P Zhang, N Kumar, L Chang… - Pattern Recognition, 2022 - Elsevier
Due to the strong nonlinear representation capabilities of deep neural networks and the low
storage and high efficiency characteristics of hash learning, deep cross-modal hashing has …

Video moment retrieval via comprehensive relation-aware network

X Sun, J Gao, Y Zhu, X Wang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Video moment retrieval aims to retrieve a target moment from an untrimmed video that
semantically corresponds to the given language query. Existing methods commonly treat it …

Toward video anomaly retrieval from video anomaly detection: New benchmarks and model

P Wu, J Liu, X He, Y Peng, P Wang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Video anomaly detection (VAD) has been paid increasing attention due to its potential
applications, its current dominant tasks focus on online detecting anomalies, which can be …

Hierarchical local-global transformer for temporal sentence grounding

X Fang, D Liu, P Zhou, Z Xu, R Li - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
This article studies the multimedia problem of temporal sentence grounding (TSG), which
aims to accurately determine the specific video segment in an untrimmed video according to …

Dhhn: Dual hierarchical hybrid network for weakly-supervised audio-visual video parsing

X Jiang, X Xu, Z Chen, J Zhang, J Song… - Proceedings of the 30th …, 2022 - dl.acm.org
The Weakly-Supervised Audio-Visual Video Parsing (AVVP) task aims to parse a video into
temporal segments and predict their event categories in terms of modalities, labeling them …

Zero-shot video moment retrieval with angular reconstructive text embeddings

X Jiang, X Xu, Z Zhou, Y Yang, F Shen… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Given an untrimmed video and a text query, Video Moment Retrieval (VMR) aims at
retrieving a specific moment where the video content is semantically related to the text …

Joint searching and grounding: Multi-granularity video content retrieval

Z Chen, X Jiang, X Xu, Z Cao, Y Mo… - Proceedings of the 31st …, 2023 - dl.acm.org
Text-based video retrieval is a well-studied task aimed at retrieving relevant videos from a
large collection in response to a given text query. Most existing TVR works assume that …

Multi-feature embedded learning SVM for cloud detection in remote sensing images

W Zhang, S Jin, L Zhou, X Xie, F Wang, L Jiang… - Computers and …, 2022 - Elsevier
To improve remote sensing image transmission efficiency, we propose a cloud detection
method using a multi-feature embedded learning support vector machine (SVM) to address …