End-to-end dense video captioning with parallel decoding
Dense video captioning aims to generate multiple associated captions with their temporal
locations from the video. Previous methods follow a sophisticated" localize-then-describe" …
locations from the video. Previous methods follow a sophisticated" localize-then-describe" …
Video captioning: a review of theory, techniques and practices.
In today's world, video captioning is extensively used in various applications for specially-
abled and, more specifically, visually abled persons. With advancements in technology for …
abled and, more specifically, visually abled persons. With advancements in technology for …
A review of deep learning for video captioning
Video captioning (VC) is a fast-moving, cross-disciplinary area of research that bridges work
in the fields of computer vision, natural language processing (NLP), linguistics, and human …
in the fields of computer vision, natural language processing (NLP), linguistics, and human …
Controllable video captioning with pos sequence guidance based on gated fusion network
In this paper, we propose to guide the video caption generation with Part-of-Speech (POS)
information, based on a gated fusion of multiple representations of input videos. We …
information, based on a gated fusion of multiple representations of input videos. We …
Iterative alignment network for continuous sign language recognition
In this paper, we propose an alignment network with iterative optimization for weakly
supervised continuous sign language recognition. Our framework consists of two modules: a …
supervised continuous sign language recognition. Our framework consists of two modules: a …
Object-aware aggregation with bidirectional temporal graph for video captioning
Video captioning aims to automatically generate natural language descriptions of video
content, which has drawn a lot of attention recent years. Generating accurate and fine …
content, which has drawn a lot of attention recent years. Generating accurate and fine …
Adapt: Action-aware driving caption transformer
End-to-end autonomous driving has great potential in the transportation industry. However,
the lack of transparency and interpretability of the automatic decision-making process …
the lack of transparency and interpretability of the automatic decision-making process …
Learning modality interaction for temporal sentence localization and event captioning in videos
Automatically generating sentences to describe events and temporally localizing sentences
in a video are two important tasks that bridge language and videos. Recent techniques …
in a video are two important tasks that bridge language and videos. Recent techniques …
Sibnet: Sibling convolutional encoder for video captioning
Video captioning is a challenging task owing to the complexity of understanding the copious
visual information in videos and describing it using natural language. Different from previous …
visual information in videos and describing it using natural language. Different from previous …
Towards bridging event captioner and sentence localizer for weakly supervised dense event captioning
Abstract Dense Event Captioning (DEC) aims to jointly localize and describe multiple events
of interest in untrimmed videos, which is an advancement of the conventional video …
of interest in untrimmed videos, which is an advancement of the conventional video …