Cross-modal dynamic networks for video moment retrieval with text query

H Zhang, A Sun, W Jing, JT Zhou - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …

被引用次数：47 相关文章所有 8 个版本

[PDF] arxiv.org

Multi-modal cross-domain alignment network for video moment retrieval

X Fang, D Liu, P Zhou, Y Hu - IEEE Transactions on Multimedia, 2022 - ieeexplore.ieee.org

As an increasingly popular task in multimedia information retrieval, video moment retrieval
(VMR) aims to localize the target moment from an untrimmed video according to a given …

被引用次数：40 相关文章所有 4 个版本

MS2GAH: Multi-label semantic supervised graph attention hashing for robust cross-modal retrieval

Y Duan, N Chen, P Zhang, N Kumar, L Chang… - Pattern Recognition, 2022 - Elsevier

Due to the strong nonlinear representation capabilities of deep neural networks and the low
storage and high efficiency characteristics of hash learning, deep cross-modal hashing has …

被引用次数：37 相关文章所有 4 个版本

[PDF] google.com

Video moment retrieval via comprehensive relation-aware network

X Sun, J Gao, Y Zhu, X Wang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Video moment retrieval aims to retrieve a target moment from an untrimmed video that
semantically corresponds to the given language query. Existing methods commonly treat it …

被引用次数：31 相关文章所有 3 个版本

[PDF] arxiv.org

Toward video anomaly retrieval from video anomaly detection: New benchmarks and model

P Wu, J Liu, X He, Y Peng, P Wang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Video anomaly detection (VAD) has been paid increasing attention due to its potential
applications, its current dominant tasks focus on online detecting anomalies, which can be …

被引用次数：17 相关文章所有 6 个版本

[PDF] arxiv.org

Hierarchical local-global transformer for temporal sentence grounding

X Fang, D Liu, P Zhou, Z Xu, R Li - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

This article studies the multimedia problem of temporal sentence grounding (TSG), which
aims to accurately determine the specific video segment in an untrimmed video according to …

被引用次数：36 相关文章所有 4 个版本

Dhhn: Dual hierarchical hybrid network for weakly-supervised audio-visual video parsing

X Jiang, X Xu, Z Chen, J Zhang, J Song… - Proceedings of the 30th …, 2022 - dl.acm.org

The Weakly-Supervised Audio-Visual Video Parsing (AVVP) task aims to parse a video into
temporal segments and predict their event categories in terms of modalities, labeling them …

被引用次数：29 相关文章

Zero-shot video moment retrieval with angular reconstructive text embeddings

X Jiang, X Xu, Z Zhou, Y Yang, F Shen… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Given an untrimmed video and a text query, Video Moment Retrieval (VMR) aims at
retrieving a specific moment where the video content is semantically related to the text …

被引用次数：4 相关文章所有 3 个版本

[PDF] archive.org

Joint searching and grounding: Multi-granularity video content retrieval

Z Chen, X Jiang, X Xu, Z Cao, Y Mo… - Proceedings of the 31st …, 2023 - dl.acm.org

Text-based video retrieval is a well-studied task aimed at retrieving relevant videos from a
large collection in response to a given text query. Most existing TVR works assume that …

被引用次数：14 相关文章所有 2 个版本

Multi-feature embedded learning SVM for cloud detection in remote sensing images

W Zhang, S Jin, L Zhou, X Xie, F Wang, L Jiang… - Computers and …, 2022 - Elsevier

To improve remote sensing image transmission efficiency, we propose a cloud detection
method using a multi-feature embedded learning support vector machine (SVM) to address …

被引用次数：21 相关文章所有 2 个版本