Learning visual knowledge memory networks for visual question answering

X Chang, P Ren, P Xu, Z Li, X Chen… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

Scene graph is a structured representation of a scene that can clearly express the objects,
attributes, and relationships between objects in the scene. As computer vision technology …

被引用次数：314 相关文章所有 15 个版本

[PDF] arxiv.org

Knowledge graphs meet multi-modal learning: A comprehensive survey

Z Chen, Y Zhang, Y Fang, Y Geng, L Guo… - arXiv preprint arXiv …, 2024 - arxiv.org

Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the
semantic web community's exploration into multi-modal dimensions unlocking new avenues …

被引用次数：23 相关文章所有 2 个版本

[PDF] thecvf.com

Relation-aware graph attention network for visual question answering

L Li, Z Gan, Y Cheng, J Liu - Proceedings of the IEEE/CVF …, 2019 - openaccess.thecvf.com

In order to answer semantically-complicated questions about an image, a Visual Question
Answering (VQA) model needs to fully understand the visual scene in the image, especially …

被引用次数：412 相关文章所有 8 个版本

[PDF] thecvf.com

Visual commonsense r-cnn

T Wang, J Huang, H Zhang… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com

We present a novel unsupervised feature representation learning method, Visual
Commonsense Region-based Convolutional Neural Network (VC R-CNN), to serve as an …

被引用次数：297 相关文章所有 10 个版本

[PDF] aaai.org

Kvqa: Knowledge-aware visual question answering

S Shah, A Mishra, N Yadati, PP Talukdar - Proceedings of the AAAI …, 2019 - aaai.org

Abstract Visual Question Answering (VQA) has emerged as an important problem spanning
Computer Vision, Natural Language Processing and Artificial Intelligence (AI). In …

被引用次数：203 相关文章所有 8 个版本

[PDF] aaai.org

Re-attention for visual question answering

W Guo, Y Zhang, J Yang, X Yuan - IEEE Transactions on Image …, 2021 - ieeexplore.ieee.org

A simultaneous understanding of questions and images is crucial in Visual Question
Answering (VQA). While the existing models have achieved satisfactory performance by …

被引用次数：106 相关文章所有 10 个版本

Dual self-attention with co-attention networks for visual question answering

Y Liu, X Zhang, Q Zhang, C Li, F Huang, X Tang, Z Li - Pattern Recognition, 2021 - Elsevier

Abstract Visual Question Answering (VQA) as an important task in understanding vision and
language has been proposed and aroused wide interests. In previous VQA methods …

被引用次数：57 相关文章所有 2 个版本

[PDF] thecvf.com

Acmm: Aligned cross-modal memory for few-shot image and sentence matching

Y Huang, L Wang - Proceedings of the IEEE/CVF …, 2019 - openaccess.thecvf.com

Image and sentence matching has drawn much attention recently, but due to the lack of
sufficient pairwise data for training, most previous methods still cannot well associate those …

被引用次数：81 相关文章所有 4 个版本

[PDF] arxiv.org

Learning visual commonsense for robust scene graph generation

A Zareian, Z Wang, H You, SF Chang - … Glasgow, UK, August 23–28, 2020 …, 2020 - Springer

Scene graph generation models understand the scene through object and predicate
recognition, but are prone to mistakes due to the challenges of perception in the wild …

被引用次数：77 相关文章所有 5 个版本

A survey of methods, datasets and evaluation metrics for visual question answering

H Sharma, AS Jalal - Image and Vision Computing, 2021 - Elsevier

Abstract Visual Question Answering (VQA) is a multi-disciplinary research problem that has
captured the attention of both computer vision as well as natural language processing …

被引用次数：40 相关文章所有 2 个版本