Unleashing the potential of conversational AI: Amplifying Chat-GPT's capabilities and tackling technical hurdles
Conversational AI has seen a growing interest among government, researchers, and
industrialists. This comprehensive survey paper provides an in-depth analysis of large …
industrialists. This comprehensive survey paper provides an in-depth analysis of large …
Temporal sentence grounding in videos: A survey and future directions
Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …
Universal weighting metric learning for cross-modal retrieval
Cross-modal retrieval has recently attracted growing attention, which aims to match
instances captured from different modalities. The performance of cross-modal retrieval …
instances captured from different modalities. The performance of cross-modal retrieval …
Dual learning with dynamic knowledge distillation for partially relevant video retrieval
J Dong, M Zhang, Z Zhang, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Almost all previous text-to-video retrieval works assume that videos are pre-trimmed with
short durations. However, in practice, videos are generally untrimmed containing much …
short durations. However, in practice, videos are generally untrimmed containing much …
Reading-strategy inspired visual representation learning for text-to-video retrieval
This paper aims for the task of text-to-video retrieval, where given a query in the form of a
natural-language sentence, it is asked to retrieve videos which are semantically relevant to …
natural-language sentence, it is asked to retrieve videos which are semantically relevant to …
Video corpus moment retrieval with contrastive learning
Given a collection of untrimmed and unsegmented videos, video corpus moment retrieval
(VCMR) is to retrieve a temporal moment (ie, a fraction of a video) that semantically …
(VCMR) is to retrieve a temporal moment (ie, a fraction of a video) that semantically …
Partially relevant video retrieval
Current methods for text-to-video retrieval (T2VR) are trained and tested on video-captioning
oriented datasets such as MSVD, MSR-VTT and VATEX. A key property of these datasets is …
oriented datasets such as MSVD, MSR-VTT and VATEX. A key property of these datasets is …
Hanet: Hierarchical alignment networks for video-text retrieval
Video-text retrieval is an important yet challenging task in vision-language understanding,
which aims to learn a joint embedding space where related video and text instances are …
which aims to learn a joint embedding space where related video and text instances are …
Lightweight attentional feature fusion: A new baseline for text-to-video retrieval
In this paper we revisit feature fusion, an old-fashioned topic, in the new context of text-to-
video retrieval. Different from previous research that considers feature fusion only at one …
video retrieval. Different from previous research that considers feature fusion only at one …
Transferring image-clip to video-text retrieval via temporal relations
We present a novel network to transfer the image-language pre-trained model to video-text
retrieval in an end-to-end manner. Leading approaches in the domain of video-and …
retrieval in an end-to-end manner. Leading approaches in the domain of video-and …