Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning
In recent years, the explosion of web videos makes text-video retrieval increasingly essential
and popular for video filtering, recommendation, and search. Text-video retrieval aims to …
and popular for video filtering, recommendation, and search. Text-video retrieval aims to …
Learning linguistic association towards efficient text-video retrieval
Text-video retrieval attracts growing attention recently. A dominant approach is to learn a
common space for aligning two modalities. However, video deliver richer content than text in …
common space for aligning two modalities. However, video deliver richer content than text in …
Synthesizing Videos from Images for Image-to-Video Adaptation
We address the image-to-video adaptation task that aims to leverage labeled images and
unlabeled videos for video recognition. There are two major challenges in this task …
unlabeled videos for video recognition. There are two major challenges in this task …
A simple yet effective knowledge guided method for entity-aware video captioning on a basketball benchmark
Despite the recent emergence of video captioning models, how to generate the text
description with specific entity names and fine-grained actions is far from being solved …
description with specific entity names and fine-grained actions is far from being solved …
Unsupervised Image-to-Video Adaptation via Category-aware Flow Memory Bank and Realistic Video Generation
Image-to-Video adaptation is proposed to train a model using labeled images and
unlabeled videos to facilitate the classification of unlabeled videos. The latest work …
unlabeled videos to facilitate the classification of unlabeled videos. The latest work …
Self-expressive induced clustered attention for video-text retrieval
Extensive research has proven that self-attention achieves impressive performance in video-
text retrieval. However, most state-of-the-art methods neglect the intrinsic redundancy in …
text retrieval. However, most state-of-the-art methods neglect the intrinsic redundancy in …
Knowledge Graph Supported Benchmark and Video Captioning for Basketball
Despite the recent emergence of video captioning models, how to generate the text
description with specific entity names and fine-grained actions is far from being solved …
description with specific entity names and fine-grained actions is far from being solved …
Structured Encoding Based on Semantic Disambiguation for Video Captioning
B Sun, J Tian, Y Wu, L Yu, Y Tang - Cognitive Computation, 2024 - Springer
Video captioning, which aims to automatically generate video captions, has gained
significant attention due to its wide range of applications in video surveillance and retrieval …
significant attention due to its wide range of applications in video surveillance and retrieval …