Tvsum: Summarizing web videos using titles

N Le, VS Rathour, K Yamazaki, K Luu… - Artificial Intelligence …, 2022 - Springer

Deep reinforcement learning augments the reinforcement learning framework and utilizes
the powerful representation of deep neural networks. Recent works have demonstrated the …

被引用次数：160 相关文章所有 10 个版本

[PDF] thecvf.com

Univtg: Towards unified video-language temporal grounding

KQ Lin, P Zhang, J Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract Video Temporal Grounding (VTG), which aims to ground target clips from videos
(such as consecutive intervals or disjoint shots) according to custom language queries (eg …

被引用次数：44 相关文章所有 4 个版本

[PDF] thecvf.com

Timechat: A time-sensitive multimodal large language model for long video understanding

S Ren, L Yao, S Li, X Sun… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

This work proposes TimeChat a time-sensitive multimodal large language model specifically
designed for long video understanding. Our model incorporates two key architectural …

被引用次数：27 相关文章所有 4 个版本

[PDF] thecvf.com

Egovlpv2: Egocentric video-language pre-training with fusion in the backbone

S Pramanick, Y Song, S Nag, KQ Lin… - Proceedings of the …, 2023 - openaccess.thecvf.com

Video-language pre-training (VLP) has become increasingly important due to its ability to
generalize to various vision and language tasks. However, existing egocentric VLP …

被引用次数：31 相关文章所有 6 个版本

A review on video summarization techniques

P Meena, H Kumar, SK Yadav - Engineering Applications of Artificial …, 2023 - Elsevier

The exponential growth of technology has resulted in a profusion of advanced imaging
devices and eases internet accessibility, leading to an increase in the creation and use of …

被引用次数：17 相关文章所有 2 个版本

[PDF] arxiv.org

Video summarization using deep neural networks: A survey

E Apostolidis, E Adamantidou, AI Metsai… - Proceedings of the …, 2021 - ieeexplore.ieee.org

Video summarization technologies aim to create a concise and complete synopsis by
selecting the most informative parts of the video content. Several approaches have been …

被引用次数：206 相关文章所有 8 个版本

[PDF] aaai.org

Learning 2d temporal adjacent networks for moment localization with natural language

S Zhang, H Peng, J Fu, J Luo - Proceedings of the AAAI Conference on …, 2020 - ojs.aaai.org

We address the problem of retrieving a specific moment from an untrimmed video by a query
sentence. This is a challenging problem because a target moment may take place in …

被引用次数：407 相关文章所有 7 个版本

[PDF] thecvf.com

Query-dependent video representation for moment retrieval and highlight detection

WJ Moon, S Hyun, SU Park, D Park… - Proceedings of the …, 2023 - openaccess.thecvf.com

Recently, video moment retrieval and highlight detection (MR/HD) are being spotlighted as
the demand for video understanding is drastically increased. The key objective of MR/HD is …

被引用次数：42 相关文章所有 5 个版本

[PDF] thecvf.com

Align and attend: Multimodal summarization with dual contrastive losses

B He, J Wang, J Qiu, T Bui… - Proceedings of the …, 2023 - openaccess.thecvf.com

The goal of multimodal summarization is to extract the most important information from
different modalities to form summaries. Unlike unimodal summarization, the multimodal …

被引用次数：36 相关文章所有 7 个版本

[PDF] thecvf.com

Localizing moments in video with natural language

L Anne Hendricks, O Wang… - Proceedings of the …, 2017 - openaccess.thecvf.com

We consider retrieving a specific temporal segment, or moment, from a video given a natural
language text description. Methods designed to retrieve whole video clips with natural …

被引用次数：899 相关文章所有 10 个版本