Scene graph generation: A comprehensive survey

A survey on graph neural networks and graph transformers in computer vision: A task-oriented perspective

C Chen, Y Wu, Q Dai, HY Zhou, M Xu… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Graph Neural Networks (GNNs) have gained momentum in graph representation learning
and boosted the state of the art in a variety of areas, such as data mining (eg, social network …

被引用次数：65 相关文章所有 3 个版本

[PDF] mdpi.com

Constructing maps for autonomous robotics: An introductory conceptual overview

P Racinskis, J Arents, M Greitans - Electronics, 2023 - mdpi.com

Mapping the environment is a powerful technique for enabling autonomy through
localization and planning in robotics. This article seeks to provide a global overview of …

被引用次数：10 相关文章所有 6 个版本

[PDF] sagepub.com

Foundations of spatial perception for robotics: Hierarchical representations and real-time systems

N Hughes, Y Chang, S Hu, R Talak… - … Journal of Robotics …, 2024 - journals.sagepub.com

3D spatial perception is the problem of building and maintaining an actionable and
persistent representation of the environment in real-time using sensor data and prior …

被引用次数：39 相关文章所有 3 个版本

[PDF] arxiv.org

Pair then relation: Pair-net for panoptic scene graph generation

J Wang, Z Wen, X Li, Z Guo, J Yang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Panoptic Scene Graph (PSG) is a challenging task in Scene Graph Generation (SGG) that
aims to create a more comprehensive scene graph representation using panoptic …

被引用次数：15 相关文章所有 3 个版本

[PDF] thecvf.com

Hilo: Exploiting high low frequency relations for unbiased panoptic scene graph generation

Z Zhou, M Shi, H Caesar - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

Abstract Panoptic Scene Graph generation (PSG) is a recently proposed task in image
scene understanding that aims to segment the image and extract triplets of subjects, objects …

被引用次数：18 相关文章所有 9 个版本

[PDF] arxiv.org

Emergent visual-semantic hierarchies in image-text representations

M Alper, H Averbuch-Elor - European Conference on Computer Vision, 2025 - Springer

While recent vision-and-language models (VLMs) like CLIP are a powerful tool for analyzing
text and images in a shared semantic space, they do not explicitly model the hierarchical …

被引用次数：4 相关文章所有 6 个版本

[PDF] thecvf.com

Learning situation hyper-graphs for video question answering

A Urooj, H Kuehne, B Wu, K Chheu… - Proceedings of the …, 2023 - openaccess.thecvf.com

Answering questions about complex situations in videos requires not only capturing of the
presence of actors, objects, and their relations, but also the evolution of these relationships …

被引用次数：16 相关文章所有 7 个版本

[PDF] ieee.org

Synthesizing event-centric knowledge graphs of daily activities using virtual space

S Egami, T Ugai, M Oono, K Kitamura, K Fukuda - IEEE Access, 2023 - ieeexplore.ieee.org

Artificial intelligence (AI) is expected to be embodied in software agents, robots, and cyber-
physical systems that can understand the various contextual information of daily life in the …

被引用次数：21 相关文章所有 7 个版本

[PDF] thecvf.com

More knowledge, less bias: Unbiasing scene graph generation with explicit ontological adjustment

Z Chen, S Rezayi, S Li - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

Scene graph generation (SGG) models seek to detect relationships between objects in a
given image. One challenge in this area is the biased distribution of predicates in the dataset …

被引用次数：21 相关文章所有 3 个版本

[PDF] arxiv.org

Reefknot: A comprehensive benchmark for relation hallucination evaluation, analysis and mitigation in multimodal large language models

K Zheng, J Chen, Y Yan, X Zou, X Hu - arXiv preprint arXiv:2408.09429, 2024 - arxiv.org

Hallucination issues persistently plagued current multimodal large language models
(MLLMs). While existing research primarily focuses on object-level or attribute-level …

被引用次数：7 相关文章所有 2 个版本