From show to tell: A survey on deep learning-based image captioning
Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …
reason, large research efforts have been devoted to image captioning, ie describing images …
A survey of natural language generation
This article offers a comprehensive review of the research on Natural Language Generation
(NLG) over the past two decades, especially in relation to data-to-text generation and text-to …
(NLG) over the past two decades, especially in relation to data-to-text generation and text-to …
Interactive and explainable region-guided radiology report generation
The automatic generation of radiology reports has the potential to assist radiologists in the
time-consuming task of report writing. Existing methods generate the full report from image …
time-consuming task of report writing. Existing methods generate the full report from image …
Panoptic scene graph generation
Existing research addresses scene graph generation (SGG)—a critical technology for scene
understanding in images—from a detection perspective, ie., objects are detected using …
understanding in images—from a detection perspective, ie., objects are detected using …
Stacked hybrid-attention and group collaborative learning for unbiased scene graph generation
Abstract Scene Graph Generation, which generally follows a regular encoder-decoder
pipeline, aims to first encode the visual contents within the given image and then parse them …
pipeline, aims to first encode the visual contents within the given image and then parse them …
Visualgpt: Data-efficient adaptation of pretrained language models for image captioning
The limited availability of annotated data often hinders real-world applications of machine
learning. To efficiently learn from small quantities of multimodal data, we leverage the …
learning. To efficiently learn from small quantities of multimodal data, we leverage the …
Act as you wish: Fine-grained control of motion diffusion model with hierarchical semantic graphs
Most text-driven human motion generation methods employ sequential modeling
approaches, eg, transformer, to extract sentence-level text representations automatically and …
approaches, eg, transformer, to extract sentence-level text representations automatically and …
A comprehensive survey of scene graphs: Generation and application
Scene graph is a structured representation of a scene that can clearly express the objects,
attributes, and relationships between objects in the scene. As computer vision technology …
attributes, and relationships between objects in the scene. As computer vision technology …
A survey on graph neural networks and graph transformers in computer vision: A task-oriented perspective
Graph Neural Networks (GNNs) have gained momentum in graph representation learning
and boosted the state of the art in a variety of areas, such as data mining (eg, social network …
and boosted the state of the art in a variety of areas, such as data mining (eg, social network …
Human-object interaction detection via disentangled transformer
Abstract Human-Object Interaction Detection tackles the problem of joint localization and
classification of human object interactions. Existing HOI transformers either adopt a single …
classification of human object interactions. Existing HOI transformers either adopt a single …