Compositional memory for visual question answering

Q Wu, D Teney, P Wang, C Shen, A Dick… - Computer Vision and …, 2017 - Elsevier

Abstract Visual Question Answering (VQA) is a challenging task that has received increasing
attention from both the computer vision and the natural language processing communities …

被引用次数：491 相关文章所有 6 个版本

Knowledge base graph embedding module design for Visual question answering model

W Zheng, L Yin, X Chen, Z Ma, S Liu, B Yang - Pattern recognition, 2021 - Elsevier

In this paper, a knowledge base graph embedding module is constructed to extend the
versatility of knowledge-based VQA (Visual Question Answering) models. The knowledge …

被引用次数：214 相关文章所有 4 个版本

[PDF] thecvf.com

Don't just assume; look and answer: Overcoming priors for visual question answering

A Agrawal, D Batra, D Parikh… - Proceedings of the …, 2018 - openaccess.thecvf.com

A number of studies have found that today's Visual Question Answering (VQA) models are
heavily driven by superficial correlations in the training data and lack sufficient image …

被引用次数：739 相关文章所有 7 个版本

[PDF] thecvf.com

Tips and tricks for visual question answering: Learnings from the 2017 challenge

D Teney, P Anderson, X He… - Proceedings of the …, 2018 - openaccess.thecvf.com

This paper presents a state-of-the-art model for visual question answering (VQA), which won
the first place in the 2017 VQA Challenge. VQA is a task of significant importance for …

被引用次数：487 相关文章所有 12 个版本

[PDF] arxiv.org

Fvqa: Fact-based visual question answering

P Wang, Q Wu, C Shen, A Dick… - IEEE transactions on …, 2017 - ieeexplore.ieee.org

Visual Question Answering (VQA) has attracted much attention in both computer vision and
natural language processing communities, not least because it offers insight into the …

被引用次数：569 相关文章所有 10 个版本

[PDF] thecvf.com

Graph-structured representations for visual question answering

D Teney, L Liu… - Proceedings of the IEEE …, 2017 - openaccess.thecvf.com

This paper proposes to improve visual question answering (VQA) with structured
representations of both scene contents and questions. A key challenge in VQA is to require …

被引用次数：502 相关文章所有 10 个版本

[PDF] neurips.cc

Out of the box: Reasoning with graph convolution nets for factual visual question answering

M Narasimhan, S Lazebnik… - Advances in neural …, 2018 - proceedings.neurips.cc

Accurately answering a question about a given image requires combining observations with
general knowledge. While this is effortless for humans, reasoning with general knowledge …

被引用次数：281 相关文章所有 8 个版本

[PDF] arxiv.org

Image captioning and visual question answering based on attributes and external knowledge

Q Wu, C Shen, P Wang, A Dick… - IEEE transactions on …, 2017 - ieeexplore.ieee.org

Much of the recent progress in Vision-to-Language problems has been achieved through a
combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks …

被引用次数：489 相关文章所有 8 个版本

[PDF] arxiv.org

Analyzing the behavior of visual question answering models

A Agrawal, D Batra, D Parikh - arXiv preprint arXiv:1606.07356, 2016 - arxiv.org

Recently, a number of deep-learning based models have been proposed for the task of
Visual Question Answering (VQA). The performance of most models is clustered around 60 …

被引用次数：382 相关文章所有 5 个版本

[PDF] arxiv.org

Simple baseline for visual question answering

B Zhou, Y Tian, S Sukhbaatar, A Szlam… - arXiv preprint arXiv …, 2015 - arxiv.org

We describe a very simple bag-of-words baseline for visual question answering. This
baseline concatenates the word features from the question and CNN features from the …

被引用次数：427 相关文章所有 2 个版本