Deep reinforcement learning in computer vision: a comprehensive survey

N Le, VS Rathour, K Yamazaki, K Luu… - Artificial Intelligence …, 2022 - Springer
Deep reinforcement learning augments the reinforcement learning framework and utilizes
the powerful representation of deep neural networks. Recent works have demonstrated the …

Univtg: Towards unified video-language temporal grounding

KQ Lin, P Zhang, J Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Video Temporal Grounding (VTG), which aims to ground target clips from videos
(such as consecutive intervals or disjoint shots) according to custom language queries (eg …

Timechat: A time-sensitive multimodal large language model for long video understanding

S Ren, L Yao, S Li, X Sun… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
This work proposes TimeChat a time-sensitive multimodal large language model specifically
designed for long video understanding. Our model incorporates two key architectural …

Egovlpv2: Egocentric video-language pre-training with fusion in the backbone

S Pramanick, Y Song, S Nag, KQ Lin… - Proceedings of the …, 2023 - openaccess.thecvf.com
Video-language pre-training (VLP) has become increasingly important due to its ability to
generalize to various vision and language tasks. However, existing egocentric VLP …

A review on video summarization techniques

P Meena, H Kumar, SK Yadav - Engineering Applications of Artificial …, 2023 - Elsevier
The exponential growth of technology has resulted in a profusion of advanced imaging
devices and eases internet accessibility, leading to an increase in the creation and use of …

Video summarization using deep neural networks: A survey

E Apostolidis, E Adamantidou, AI Metsai… - Proceedings of the …, 2021 - ieeexplore.ieee.org
Video summarization technologies aim to create a concise and complete synopsis by
selecting the most informative parts of the video content. Several approaches have been …

Learning 2d temporal adjacent networks for moment localization with natural language

S Zhang, H Peng, J Fu, J Luo - Proceedings of the AAAI Conference on …, 2020 - ojs.aaai.org
We address the problem of retrieving a specific moment from an untrimmed video by a query
sentence. This is a challenging problem because a target moment may take place in …

Query-dependent video representation for moment retrieval and highlight detection

WJ Moon, S Hyun, SU Park, D Park… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recently, video moment retrieval and highlight detection (MR/HD) are being spotlighted as
the demand for video understanding is drastically increased. The key objective of MR/HD is …

Align and attend: Multimodal summarization with dual contrastive losses

B He, J Wang, J Qiu, T Bui… - Proceedings of the …, 2023 - openaccess.thecvf.com
The goal of multimodal summarization is to extract the most important information from
different modalities to form summaries. Unlike unimodal summarization, the multimodal …

Localizing moments in video with natural language

L Anne Hendricks, O Wang… - Proceedings of the …, 2017 - openaccess.thecvf.com
We consider retrieving a specific temporal segment, or moment, from a video given a natural
language text description. Methods designed to retrieve whole video clips with natural …