A comprehensive survey of scene graphs: Generation and application

X Chang, P Ren, P Xu, Z Li, X Chen… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Scene graph is a structured representation of a scene that can clearly express the objects,
attributes, and relationships between objects in the scene. As computer vision technology …

Knowledge graphs meet multi-modal learning: A comprehensive survey

Z Chen, Y Zhang, Y Fang, Y Geng, L Guo… - arXiv preprint arXiv …, 2024 - arxiv.org
Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the
semantic web community's exploration into multi-modal dimensions unlocking new avenues …

Relation-aware graph attention network for visual question answering

L Li, Z Gan, Y Cheng, J Liu - Proceedings of the IEEE/CVF …, 2019 - openaccess.thecvf.com
In order to answer semantically-complicated questions about an image, a Visual Question
Answering (VQA) model needs to fully understand the visual scene in the image, especially …

Visual commonsense r-cnn

T Wang, J Huang, H Zhang… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com
We present a novel unsupervised feature representation learning method, Visual
Commonsense Region-based Convolutional Neural Network (VC R-CNN), to serve as an …

Kvqa: Knowledge-aware visual question answering

S Shah, A Mishra, N Yadati, PP Talukdar - Proceedings of the AAAI …, 2019 - aaai.org
Abstract Visual Question Answering (VQA) has emerged as an important problem spanning
Computer Vision, Natural Language Processing and Artificial Intelligence (AI). In …

Re-attention for visual question answering

W Guo, Y Zhang, J Yang, X Yuan - IEEE Transactions on Image …, 2021 - ieeexplore.ieee.org
A simultaneous understanding of questions and images is crucial in Visual Question
Answering (VQA). While the existing models have achieved satisfactory performance by …

Dual self-attention with co-attention networks for visual question answering

Y Liu, X Zhang, Q Zhang, C Li, F Huang, X Tang, Z Li - Pattern Recognition, 2021 - Elsevier
Abstract Visual Question Answering (VQA) as an important task in understanding vision and
language has been proposed and aroused wide interests. In previous VQA methods …

Acmm: Aligned cross-modal memory for few-shot image and sentence matching

Y Huang, L Wang - Proceedings of the IEEE/CVF …, 2019 - openaccess.thecvf.com
Image and sentence matching has drawn much attention recently, but due to the lack of
sufficient pairwise data for training, most previous methods still cannot well associate those …

Learning visual commonsense for robust scene graph generation

A Zareian, Z Wang, H You, SF Chang - … Glasgow, UK, August 23–28, 2020 …, 2020 - Springer
Scene graph generation models understand the scene through object and predicate
recognition, but are prone to mistakes due to the challenges of perception in the wild …

A survey of methods, datasets and evaluation metrics for visual question answering

H Sharma, AS Jalal - Image and Vision Computing, 2021 - Elsevier
Abstract Visual Question Answering (VQA) is a multi-disciplinary research problem that has
captured the attention of both computer vision as well as natural language processing …