Video description: A survey of methods, datasets, and evaluation metrics
Video description is the automatic generation of natural language sentences that describe
the contents of a given video. It has applications in human-robot interaction, helping the …
the contents of a given video. It has applications in human-robot interaction, helping the …
Video description: A comprehensive survey of deep learning approaches
Video description refers to understanding visual content and transforming that acquired
understanding into automatic textual narration. It bridges the key AI fields of computer vision …
understanding into automatic textual narration. It bridges the key AI fields of computer vision …
Vid2seq: Large-scale pretraining of a visual language model for dense video captioning
In this work, we introduce Vid2Seq, a multi-modal single-stage dense event captioning
model pretrained on narrated videos which are readily-available at scale. The Vid2Seq …
model pretrained on narrated videos which are readily-available at scale. The Vid2Seq …
Full-duplex strategy for video object segmentation
Appearance and motion are two important sources of information in video object
segmentation (VOS). Previous methods mainly focus on using simplex solutions, lowering …
segmentation (VOS). Previous methods mainly focus on using simplex solutions, lowering …
Exploring visual relationship for image captioning
It is always well believed that modeling relationships between objects would be helpful for
representing and eventually describing an image. Nevertheless, there has not been …
representing and eventually describing an image. Nevertheless, there has not been …
Shifting more attention to video salient object detection
The last decade has witnessed a growing interest in video salient object detection (VSOD).
However, the research community long-term lacked a well-established VSOD dataset …
However, the research community long-term lacked a well-established VSOD dataset …
Vidchapters-7m: Video chapters at scale
Segmenting untrimmed videos into chapters enables users to quickly navigate to the
information of their interest. This important topic has been understudied due to the lack of …
information of their interest. This important topic has been understudied due to the lack of …
End-to-end dense video captioning with masked transformer
Dense video captioning aims to generate text descriptions for all events in an untrimmed
video. This involves both detecting and describing events. Therefore, all previous methods …
video. This involves both detecting and describing events. Therefore, all previous methods …
Dynamic context-sensitive filtering network for video salient object detection
The ability to capture inter-frame dynamics has been critical to the development of video
salient object detection (VSOD). While many works have achieved great success in this field …
salient object detection (VSOD). While many works have achieved great success in this field …
Video captioning with attention-based LSTM and semantic consistency
Recent progress in using long short-term memory (LSTM) for image captioning has
motivated the exploration of their applications for video captioning. By taking a video as a …
motivated the exploration of their applications for video captioning. By taking a video as a …