A review of deep learning for video captioning
Video captioning (VC) is a fast-moving, cross-disciplinary area of research that bridges work
in the fields of computer vision, natural language processing (NLP), linguistics, and human …
in the fields of computer vision, natural language processing (NLP), linguistics, and human …
Video captioning: a comparative review of where we are and which could be the route
Video captioning is the process of describing the content of a sequence of images capturing
its semantic relationships and meanings. Dealing with this task with a single image is …
its semantic relationships and meanings. Dealing with this task with a single image is …
Concept-aware video captioning: Describing videos with effective prior information
Concepts, a collective term for meaningful words that correspond to objects, actions, and
attributes, can act as an intermediary for video captioning. While many efforts have been …
attributes, can act as an intermediary for video captioning. While many efforts have been …
Explainability in graph neural networks: An experimental survey
Graph neural networks (GNNs) have been extensively developed for graph representation
learning in various application domains. However, similar to all other neural networks …
learning in various application domains. However, similar to all other neural networks …
Bridging video and text: A two-step polishing transformer for video captioning
W Xu, Z Miao, J Yu, Y Tian, L Wan… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Video captioning is a joint task of computer vision and natural language processing, which
aims to describe the video content using several natural language sentences. Nowadays …
aims to describe the video content using several natural language sentences. Nowadays …
Visual commonsense-aware representation network for video captioning
Generating consecutive descriptions for videos, that is, video captioning, requires taking full
advantage of visual representation along with the generation process. Existing video …
advantage of visual representation along with the generation process. Existing video …
Time–frequency recurrent transformer with diversity constraint for dense video captioning
P Li, P Zhang, T Wang, H Xiao - Information Processing & Management, 2023 - Elsevier
Describing a long video using multiple sentences, ie, dense video captioning, is a very
challenging task. Existing methods neglect the important fact that the actions of several …
challenging task. Existing methods neglect the important fact that the actions of several …
Mir-gan: Refining frame-level modality-invariant representations with adversarial network for audio-visual speech recognition
Audio-visual speech recognition (AVSR) attracts a surge of research interest recently by
leveraging multimodal signals to understand human speech. Mainstream approaches …
leveraging multimodal signals to understand human speech. Mainstream approaches …
Multi-sentence video captioning using spatial saliency of video frames and content-oriented beam search algorithm
Video captioning algorithms aim at expressing the information and activities contained in a
video clip in the form of lingual sentences. Most existing video captioning approaches have …
video clip in the form of lingual sentences. Most existing video captioning approaches have …
[HTML][HTML] Action knowledge for video captioning with graph neural networks
Many existing video captioning methods capture action information in the video by exploiting
features extracted from an action recognition model. However, directly using the action …
features extracted from an action recognition model. However, directly using the action …