Unleashing the potential of conversational AI: Amplifying Chat-GPT's capabilities and tackling technical hurdles

V Hassija, A Chakrabarti, A Singh, V Chamola… - IEEE …, 2023 - ieeexplore.ieee.org
Conversational AI has seen a growing interest among government, researchers, and
industrialists. This comprehensive survey paper provides an in-depth analysis of large …

Temporal sentence grounding in videos: A survey and future directions

H Zhang, A Sun, W Jing, JT Zhou - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …

Universal weighting metric learning for cross-modal retrieval

J Wei, Y Yang, X Xu, X Zhu… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Cross-modal retrieval has recently attracted growing attention, which aims to match
instances captured from different modalities. The performance of cross-modal retrieval …

Dual learning with dynamic knowledge distillation for partially relevant video retrieval

J Dong, M Zhang, Z Zhang, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Almost all previous text-to-video retrieval works assume that videos are pre-trimmed with
short durations. However, in practice, videos are generally untrimmed containing much …

Reading-strategy inspired visual representation learning for text-to-video retrieval

J Dong, Y Wang, X Chen, X Qu, X Li… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
This paper aims for the task of text-to-video retrieval, where given a query in the form of a
natural-language sentence, it is asked to retrieve videos which are semantically relevant to …

Video corpus moment retrieval with contrastive learning

H Zhang, A Sun, W Jing, G Nan, L Zhen… - Proceedings of the 44th …, 2021 - dl.acm.org
Given a collection of untrimmed and unsegmented videos, video corpus moment retrieval
(VCMR) is to retrieve a temporal moment (ie, a fraction of a video) that semantically …

Partially relevant video retrieval

J Dong, X Chen, M Zhang, X Yang, S Chen… - Proceedings of the 30th …, 2022 - dl.acm.org
Current methods for text-to-video retrieval (T2VR) are trained and tested on video-captioning
oriented datasets such as MSVD, MSR-VTT and VATEX. A key property of these datasets is …

Hanet: Hierarchical alignment networks for video-text retrieval

P Wu, X He, M Tang, Y Lv, J Liu - Proceedings of the 29th ACM …, 2021 - dl.acm.org
Video-text retrieval is an important yet challenging task in vision-language understanding,
which aims to learn a joint embedding space where related video and text instances are …

Lightweight attentional feature fusion: A new baseline for text-to-video retrieval

F Hu, A Chen, Z Wang, F Zhou, J Dong, X Li - European conference on …, 2022 - Springer
In this paper we revisit feature fusion, an old-fashioned topic, in the new context of text-to-
video retrieval. Different from previous research that considers feature fusion only at one …

Transferring image-clip to video-text retrieval via temporal relations

H Fang, P Xiong, L Xu, W Luo - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
We present a novel network to transfer the image-language pre-trained model to video-text
retrieval in an end-to-end manner. Leading approaches in the domain of video-and …