Say as you wish: Fine-grained control of image caption generation with abstract scene graphs

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

被引用次数：378 相关文章所有 11 个版本

[PDF] arxiv.org

A survey of natural language generation

C Dong, Y Li, H Gong, M Chen, J Li, Y Shen… - ACM Computing …, 2022 - dl.acm.org

This article offers a comprehensive review of the research on Natural Language Generation
(NLG) over the past two decades, especially in relation to data-to-text generation and text-to …

被引用次数：220 相关文章所有 4 个版本

[PDF] thecvf.com

Interactive and explainable region-guided radiology report generation

T Tanida, P Müller, G Kaissis… - Proceedings of the …, 2023 - openaccess.thecvf.com

The automatic generation of radiology reports has the potential to assist radiologists in the
time-consuming task of report writing. Existing methods generate the full report from image …

被引用次数：119 相关文章所有 6 个版本

[PDF] arxiv.org

Panoptic scene graph generation

J Yang, YZ Ang, Z Guo, K Zhou, W Zhang… - European Conference on …, 2022 - Springer

Existing research addresses scene graph generation (SGG)—a critical technology for scene
understanding in images—from a detection perspective, ie., objects are detected using …

被引用次数：115 相关文章所有 5 个版本

[PDF] thecvf.com

Stacked hybrid-attention and group collaborative learning for unbiased scene graph generation

X Dong, T Gan, X Song, J Wu… - Proceedings of the …, 2022 - openaccess.thecvf.com

Abstract Scene Graph Generation, which generally follows a regular encoder-decoder
pipeline, aims to first encode the visual contents within the given image and then parse them …

被引用次数：106 相关文章所有 6 个版本

[PDF] thecvf.com

Visualgpt: Data-efficient adaptation of pretrained language models for image captioning

J Chen, H Guo, K Yi, B Li… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

The limited availability of annotated data often hinders real-world applications of machine
learning. To efficiently learn from small quantities of multimodal data, we leverage the …

被引用次数：228 相关文章所有 12 个版本

[PDF] neurips.cc

Act as you wish: Fine-grained control of motion diffusion model with hierarchical semantic graphs

P Jin, Y Wu, Y Fan, Z Sun, W Yang… - Advances in Neural …, 2024 - proceedings.neurips.cc

Most text-driven human motion generation methods employ sequential modeling
approaches, eg, transformer, to extract sentence-level text representations automatically and …

被引用次数：23 相关文章所有 5 个版本

[PDF] arxiv.org

A comprehensive survey of scene graphs: Generation and application

X Chang, P Ren, P Xu, Z Li, X Chen… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

Scene graph is a structured representation of a scene that can clearly express the objects,
attributes, and relationships between objects in the scene. As computer vision technology …

被引用次数：343 相关文章所有 15 个版本

[PDF] arxiv.org

A survey on graph neural networks and graph transformers in computer vision: A task-oriented perspective

C Chen, Y Wu, Q Dai, HY Zhou, M Xu… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Graph Neural Networks (GNNs) have gained momentum in graph representation learning
and boosted the state of the art in a variety of areas, such as data mining (eg, social network …

被引用次数：60 相关文章所有 3 个版本

[PDF] thecvf.com

Human-object interaction detection via disentangled transformer

D Zhou, Z Liu, J Wang, L Wang, T Hu… - Proceedings of the …, 2022 - openaccess.thecvf.com

Abstract Human-Object Interaction Detection tackles the problem of joint localization and
classification of human object interactions. Existing HOI transformers either adopt a single …

被引用次数：71 相关文章所有 7 个版本