Video captioning with guidance of multimodal latent topics

T Wang, R Zhang, Z Lu, F Zheng… - Proceedings of the …, 2021 - openaccess.thecvf.com

Dense video captioning aims to generate multiple associated captions with their temporal
locations from the video. Previous methods follow a sophisticated" localize-then-describe" …

被引用次数：209 相关文章所有 6 个版本

Video captioning: a review of theory, techniques and practices.

V Jain, F Al-Turjman, G Chaudhary… - Multimedia Tools & …, 2022 - search.ebscohost.com

In today's world, video captioning is extensively used in various applications for specially-
abled and, more specifically, visually abled persons. With advancements in technology for …

被引用次数：31 相关文章所有 3 个版本

[PDF] arxiv.org

A review of deep learning for video captioning

M Abdar, M Kollati, S Kuraparthi, F Pourpanah… - arXiv preprint arXiv …, 2023 - arxiv.org

Video captioning (VC) is a fast-moving, cross-disciplinary area of research that bridges work
in the fields of computer vision, natural language processing (NLP), linguistics, and human …

被引用次数：21 相关文章所有 3 个版本

[PDF] thecvf.com

Controllable video captioning with pos sequence guidance based on gated fusion network

B Wang, L Ma, W Zhang, W Jiang… - Proceedings of the …, 2019 - openaccess.thecvf.com

In this paper, we propose to guide the video caption generation with Part-of-Speech (POS)
information, based on a gated fusion of multiple representations of input videos. We …

被引用次数：222 相关文章所有 5 个版本

[PDF] thecvf.com

Iterative alignment network for continuous sign language recognition

J Pu, W Zhou, H Li - … of the IEEE/CVF conference on …, 2019 - openaccess.thecvf.com

In this paper, we propose an alignment network with iterative optimization for weakly
supervised continuous sign language recognition. Our framework consists of two modules: a …

被引用次数：270 相关文章所有 6 个版本

[PDF] thecvf.com

Object-aware aggregation with bidirectional temporal graph for video captioning

J Zhang, Y Peng - Proceedings of the IEEE/CVF conference …, 2019 - openaccess.thecvf.com

Video captioning aims to automatically generate natural language descriptions of video
content, which has drawn a lot of attention recent years. Generating accurate and fine …

被引用次数：216 相关文章所有 6 个版本

[PDF] arxiv.org

Adapt: Action-aware driving caption transformer

B Jin, X Liu, Y Zheng, P Li, H Zhao… - … on Robotics and …, 2023 - ieeexplore.ieee.org

End-to-end autonomous driving has great potential in the transportation industry. However,
the lack of transparency and interpretability of the automatic decision-making process …

被引用次数：65 相关文章所有 5 个版本

[PDF] arxiv.org

Learning modality interaction for temporal sentence localization and event captioning in videos

S Chen, W Jiang, W Liu, YG Jiang - … , Glasgow, UK, August 23–28, 2020 …, 2020 - Springer

Automatically generating sentences to describe events and temporally localizing sentences
in a video are two important tasks that bridge language and videos. Recent techniques …

被引用次数：111 相关文章所有 5 个版本

[PDF] buffalo.edu

Sibnet: Sibling convolutional encoder for video captioning

S Liu, Z Ren, J Yuan - Proceedings of the 26th ACM international …, 2018 - dl.acm.org

Video captioning is a challenging task owing to the complexity of understanding the copious
visual information in videos and describing it using natural language. Different from previous …

被引用次数：175 相关文章所有 8 个版本

[PDF] thecvf.com

Towards bridging event captioner and sentence localizer for weakly supervised dense event captioning

S Chen, YG Jiang - … of the IEEE/CVF Conference on …, 2021 - openaccess.thecvf.com

Abstract Dense Event Captioning (DEC) aims to jointly localize and describe multiple events
of interest in untrimmed videos, which is an advancement of the conventional video …

被引用次数：77 相关文章所有 4 个版本