Video description: A survey of methods, datasets, and evaluation metrics

N Aafaq, A Mian, W Liu, SZ Gilani, M Shah - ACM Computing Surveys …, 2019 - dl.acm.org
Video description is the automatic generation of natural language sentences that describe
the contents of a given video. It has applications in human-robot interaction, helping the …

Video description: A comprehensive survey of deep learning approaches

G Rafiq, M Rafiq, GS Choi - Artificial Intelligence Review, 2023 - Springer
Video description refers to understanding visual content and transforming that acquired
understanding into automatic textual narration. It bridges the key AI fields of computer vision …

Vid2seq: Large-scale pretraining of a visual language model for dense video captioning

A Yang, A Nagrani, PH Seo, A Miech… - Proceedings of the …, 2023 - openaccess.thecvf.com
In this work, we introduce Vid2Seq, a multi-modal single-stage dense event captioning
model pretrained on narrated videos which are readily-available at scale. The Vid2Seq …

Full-duplex strategy for video object segmentation

GP Ji, K Fu, Z Wu, DP Fan, J Shen… - Proceedings of the …, 2021 - openaccess.thecvf.com
Appearance and motion are two important sources of information in video object
segmentation (VOS). Previous methods mainly focus on using simplex solutions, lowering …

Exploring visual relationship for image captioning

T Yao, Y Pan, Y Li, T Mei - Proceedings of the European …, 2018 - openaccess.thecvf.com
It is always well believed that modeling relationships between objects would be helpful for
representing and eventually describing an image. Nevertheless, there has not been …

Shifting more attention to video salient object detection

DP Fan, W Wang, MM Cheng… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
The last decade has witnessed a growing interest in video salient object detection (VSOD).
However, the research community long-term lacked a well-established VSOD dataset …

Vidchapters-7m: Video chapters at scale

A Yang, A Nagrani, I Laptev, J Sivic… - Advances in Neural …, 2024 - proceedings.neurips.cc
Segmenting untrimmed videos into chapters enables users to quickly navigate to the
information of their interest. This important topic has been understudied due to the lack of …

End-to-end dense video captioning with masked transformer

L Zhou, Y Zhou, JJ Corso… - Proceedings of the …, 2018 - openaccess.thecvf.com
Dense video captioning aims to generate text descriptions for all events in an untrimmed
video. This involves both detecting and describing events. Therefore, all previous methods …

Dynamic context-sensitive filtering network for video salient object detection

M Zhang, J Liu, Y Wang, Y Piao… - Proceedings of the …, 2021 - openaccess.thecvf.com
The ability to capture inter-frame dynamics has been critical to the development of video
salient object detection (VSOD). While many works have achieved great success in this field …

Video captioning with attention-based LSTM and semantic consistency

L Gao, Z Guo, H Zhang, X Xu… - IEEE Transactions on …, 2017 - ieeexplore.ieee.org
Recent progress in using long short-term memory (LSTM) for image captioning has
motivated the exploration of their applications for video captioning. By taking a video as a …