Deep reinforcement learning in computer vision: a comprehensive survey
Deep reinforcement learning augments the reinforcement learning framework and utilizes
the powerful representation of deep neural networks. Recent works have demonstrated the …
the powerful representation of deep neural networks. Recent works have demonstrated the …
Univtg: Towards unified video-language temporal grounding
Abstract Video Temporal Grounding (VTG), which aims to ground target clips from videos
(such as consecutive intervals or disjoint shots) according to custom language queries (eg …
(such as consecutive intervals or disjoint shots) according to custom language queries (eg …
Timechat: A time-sensitive multimodal large language model for long video understanding
This work proposes TimeChat a time-sensitive multimodal large language model specifically
designed for long video understanding. Our model incorporates two key architectural …
designed for long video understanding. Our model incorporates two key architectural …
Egovlpv2: Egocentric video-language pre-training with fusion in the backbone
Video-language pre-training (VLP) has become increasingly important due to its ability to
generalize to various vision and language tasks. However, existing egocentric VLP …
generalize to various vision and language tasks. However, existing egocentric VLP …
A review on video summarization techniques
The exponential growth of technology has resulted in a profusion of advanced imaging
devices and eases internet accessibility, leading to an increase in the creation and use of …
devices and eases internet accessibility, leading to an increase in the creation and use of …
Video summarization using deep neural networks: A survey
Video summarization technologies aim to create a concise and complete synopsis by
selecting the most informative parts of the video content. Several approaches have been …
selecting the most informative parts of the video content. Several approaches have been …
Learning 2d temporal adjacent networks for moment localization with natural language
We address the problem of retrieving a specific moment from an untrimmed video by a query
sentence. This is a challenging problem because a target moment may take place in …
sentence. This is a challenging problem because a target moment may take place in …
Query-dependent video representation for moment retrieval and highlight detection
Recently, video moment retrieval and highlight detection (MR/HD) are being spotlighted as
the demand for video understanding is drastically increased. The key objective of MR/HD is …
the demand for video understanding is drastically increased. The key objective of MR/HD is …
Align and attend: Multimodal summarization with dual contrastive losses
The goal of multimodal summarization is to extract the most important information from
different modalities to form summaries. Unlike unimodal summarization, the multimodal …
different modalities to form summaries. Unlike unimodal summarization, the multimodal …
Localizing moments in video with natural language
L Anne Hendricks, O Wang… - Proceedings of the …, 2017 - openaccess.thecvf.com
We consider retrieving a specific temporal segment, or moment, from a video given a natural
language text description. Methods designed to retrieve whole video clips with natural …
language text description. Methods designed to retrieve whole video clips with natural …