Deep reinforcement learning in computer vision: a comprehensive survey
Deep reinforcement learning augments the reinforcement learning framework and utilizes
the powerful representation of deep neural networks. Recent works have demonstrated the …
the powerful representation of deep neural networks. Recent works have demonstrated the …
[HTML][HTML] An analytical study of information extraction from unstructured and multidimensional big data
Process of information extraction (IE) is used to extract useful information from unstructured
or semi-structured data. Big data arise new challenges for IE techniques with the rapid …
or semi-structured data. Big data arise new challenges for IE techniques with the rapid …
Univtg: Towards unified video-language temporal grounding
Abstract Video Temporal Grounding (VTG), which aims to ground target clips from videos
(such as consecutive intervals or disjoint shots) according to custom language queries (eg …
(such as consecutive intervals or disjoint shots) according to custom language queries (eg …
Egovlpv2: Egocentric video-language pre-training with fusion in the backbone
Video-language pre-training (VLP) has become increasingly important due to its ability to
generalize to various vision and language tasks. However, existing egocentric VLP …
generalize to various vision and language tasks. However, existing egocentric VLP …
A survey of deep learning-based object detection
Object detection is one of the most important and challenging branches of computer vision,
which has been widely applied in people's life, such as monitoring security, autonomous …
which has been widely applied in people's life, such as monitoring security, autonomous …
Video summarization using deep neural networks: A survey
Video summarization technologies aim to create a concise and complete synopsis by
selecting the most informative parts of the video content. Several approaches have been …
selecting the most informative parts of the video content. Several approaches have been …
Align and attend: Multimodal summarization with dual contrastive losses
The goal of multimodal summarization is to extract the most important information from
different modalities to form summaries. Unlike unimodal summarization, the multimodal …
different modalities to form summaries. Unlike unimodal summarization, the multimodal …
Clip-it! language-guided video summarization
M Narasimhan, A Rohrbach… - Advances in neural …, 2021 - proceedings.neurips.cc
A generic video summary is an abridged version of a video that conveys the whole story and
features the most important scenes. Yet the importance of scenes in a video is often …
features the most important scenes. Yet the importance of scenes in a video is often …
Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward
Video summarization aims to facilitate large-scale video browsing by producing short,
concise summaries that are diverse and representative of original videos. In this paper, we …
concise summaries that are diverse and representative of original videos. In this paper, we …
Unsupervised video summarization with adversarial lstm networks
B Mahasseni, M Lam… - Proceedings of the IEEE …, 2017 - openaccess.thecvf.com
This paper addresses the problem of unsupervised video summarization, formulated as
selecting a sparse subset of video frames that optimally represent the input video. Our key …
selecting a sparse subset of video frames that optimally represent the input video. Our key …