Scene graph generation: A comprehensive survey

G Zhu, L Zhang, Y Jiang, Y Dang, H Hou… - arXiv preprint arXiv …, 2022 - arxiv.org
Deep learning techniques have led to remarkable breakthroughs in the field of generic
object detection and have spawned a lot of scene-understanding tasks in recent years …

[HTML][HTML] Scene graph generation: A comprehensive survey

H Li, G Zhu, L Zhang, Y Jiang, Y Dang, H Hou, P Shen… - Neurocomputing, 2024 - Elsevier
Deep learning techniques have led to remarkable breakthroughs in the field of object
detection and have spawned a lot of scene-understanding tasks in recent years. Scene …

What to look at and where: Semantic and spatial refined transformer for detecting human-object interactions

ASM Iftekhar, H Chen, K Kundu, X Li… - Proceedings of the …, 2022 - openaccess.thecvf.com
We propose a novel one-stage Transformer-based semantic and spatial refined transformer
(SSRT) to solve the Human-Object Interaction detection task, which requires to localize …

Webly supervised knowledge-embedded model for visual reasoning

W Zheng, L Yan, W Zhang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Visual reasoning between visual images and natural language remains a long-standing
challenge in computer vision. Conventional deep supervision methods target at finding …

Sgpt: The secondary path guides the primary path in transformers for hoi detection

S Chan, W Wang, Z Shao, C Bai - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
HOI detection is essential for human-computer interaction, especially in behavior detection
and robot manipulation. Existing mainstream transformer methods of HOI detection are …

Knowledge-Embedded Mutual Guidance for Visual Reasoning

W Zheng, L Yan, L Chen, Q Li… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Visual reasoning between visual images and natural language is a long-standing challenge
in computer vision. Most of the methods aim to look for answers to questions only on the …

A symmetric fusion learning model for detecting visual relations and scene parsing

X Liu, X Jing, Z Zheng, W Du, X Ding… - Scientific …, 2022 - Wiley Online Library
Visual relationship detection (VRD) aims to locate objects and recognize their pairwise
relationships for parsing scene graphs. To enable a higher understanding of the visual …