From show to tell: A survey on deep learning-based image captioning

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

A survey of natural language generation

C Dong, Y Li, H Gong, M Chen, J Li, Y Shen… - ACM Computing …, 2022 - dl.acm.org
This article offers a comprehensive review of the research on Natural Language Generation
(NLG) over the past two decades, especially in relation to data-to-text generation and text-to …

Interactive and explainable region-guided radiology report generation

T Tanida, P Müller, G Kaissis… - Proceedings of the …, 2023 - openaccess.thecvf.com
The automatic generation of radiology reports has the potential to assist radiologists in the
time-consuming task of report writing. Existing methods generate the full report from image …

Panoptic scene graph generation

J Yang, YZ Ang, Z Guo, K Zhou, W Zhang… - European Conference on …, 2022 - Springer
Existing research addresses scene graph generation (SGG)—a critical technology for scene
understanding in images—from a detection perspective, ie., objects are detected using …

Stacked hybrid-attention and group collaborative learning for unbiased scene graph generation

X Dong, T Gan, X Song, J Wu… - Proceedings of the …, 2022 - openaccess.thecvf.com
Abstract Scene Graph Generation, which generally follows a regular encoder-decoder
pipeline, aims to first encode the visual contents within the given image and then parse them …

Visualgpt: Data-efficient adaptation of pretrained language models for image captioning

J Chen, H Guo, K Yi, B Li… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
The limited availability of annotated data often hinders real-world applications of machine
learning. To efficiently learn from small quantities of multimodal data, we leverage the …

Act as you wish: Fine-grained control of motion diffusion model with hierarchical semantic graphs

P Jin, Y Wu, Y Fan, Z Sun, W Yang… - Advances in Neural …, 2024 - proceedings.neurips.cc
Most text-driven human motion generation methods employ sequential modeling
approaches, eg, transformer, to extract sentence-level text representations automatically and …

A comprehensive survey of scene graphs: Generation and application

X Chang, P Ren, P Xu, Z Li, X Chen… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Scene graph is a structured representation of a scene that can clearly express the objects,
attributes, and relationships between objects in the scene. As computer vision technology …

A survey on graph neural networks and graph transformers in computer vision: A task-oriented perspective

C Chen, Y Wu, Q Dai, HY Zhou, M Xu… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Graph Neural Networks (GNNs) have gained momentum in graph representation learning
and boosted the state of the art in a variety of areas, such as data mining (eg, social network …

Human-object interaction detection via disentangled transformer

D Zhou, Z Liu, J Wang, L Wang, T Hu… - Proceedings of the …, 2022 - openaccess.thecvf.com
Abstract Human-Object Interaction Detection tackles the problem of joint localization and
classification of human object interactions. Existing HOI transformers either adopt a single …