Video captioning with transferred semantic attributes

N Aafaq, A Mian, W Liu, SZ Gilani, M Shah - ACM Computing Surveys …, 2019 - dl.acm.org

Video description is the automatic generation of natural language sentences that describe
the contents of a given video. It has applications in human-robot interaction, helping the …

被引用次数：255 相关文章所有 10 个版本

[PDF] springer.com

Video description: A comprehensive survey of deep learning approaches

G Rafiq, M Rafiq, GS Choi - Artificial Intelligence Review, 2023 - Springer

Video description refers to understanding visual content and transforming that acquired
understanding into automatic textual narration. It bridges the key AI fields of computer vision …

被引用次数：26 相关文章所有 5 个版本

[PDF] thecvf.com

Vid2seq: Large-scale pretraining of a visual language model for dense video captioning

A Yang, A Nagrani, PH Seo, A Miech… - Proceedings of the …, 2023 - openaccess.thecvf.com

In this work, we introduce Vid2Seq, a multi-modal single-stage dense event captioning
model pretrained on narrated videos which are readily-available at scale. The Vid2Seq …

被引用次数：225 相关文章所有 26 个版本

[PDF] thecvf.com

Full-duplex strategy for video object segmentation

GP Ji, K Fu, Z Wu, DP Fan, J Shen… - Proceedings of the …, 2021 - openaccess.thecvf.com

Appearance and motion are two important sources of information in video object
segmentation (VOS). Previous methods mainly focus on using simplex solutions, lowering …

被引用次数：158 相关文章所有 13 个版本

[PDF] thecvf.com

Exploring visual relationship for image captioning

T Yao, Y Pan, Y Li, T Mei - Proceedings of the European …, 2018 - openaccess.thecvf.com

It is always well believed that modeling relationships between objects would be helpful for
representing and eventually describing an image. Nevertheless, there has not been …

被引用次数：1062 相关文章所有 9 个版本

[PDF] thecvf.com

Shifting more attention to video salient object detection

DP Fan, W Wang, MM Cheng… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com

The last decade has witnessed a growing interest in video salient object detection (VSOD).
However, the research community long-term lacked a well-established VSOD dataset …

被引用次数：538 相关文章所有 8 个版本

[PDF] neurips.cc

Vidchapters-7m: Video chapters at scale

A Yang, A Nagrani, I Laptev, J Sivic… - Advances in Neural …, 2024 - proceedings.neurips.cc

Segmenting untrimmed videos into chapters enables users to quickly navigate to the
information of their interest. This important topic has been understudied due to the lack of …

被引用次数：25 相关文章所有 19 个版本

[PDF] thecvf.com

End-to-end dense video captioning with masked transformer

L Zhou, Y Zhou, JJ Corso… - Proceedings of the …, 2018 - openaccess.thecvf.com

Dense video captioning aims to generate text descriptions for all events in an untrimmed
video. This involves both detecting and describing events. Therefore, all previous methods …

被引用次数：691 相关文章所有 8 个版本

[PDF] thecvf.com

Dynamic context-sensitive filtering network for video salient object detection

M Zhang, J Liu, Y Wang, Y Piao… - Proceedings of the …, 2021 - openaccess.thecvf.com

The ability to capture inter-frame dynamics has been critical to the development of video
salient object detection (VSOD). While many works have achieved great success in this field …

被引用次数：117 相关文章所有 4 个版本

Video captioning with attention-based LSTM and semantic consistency

L Gao, Z Guo, H Zhang, X Xu… - IEEE Transactions on …, 2017 - ieeexplore.ieee.org

Recent progress in using long short-term memory (LSTM) for image captioning has
motivated the exploration of their applications for video captioning. By taking a video as a …

被引用次数：680 相关文章所有 4 个版本