Visual question answering: A survey of methods and datasets

Q Wu, D Teney, P Wang, C Shen, A Dick… - Computer Vision and …, 2017 - Elsevier
Abstract Visual Question Answering (VQA) is a challenging task that has received increasing
attention from both the computer vision and the natural language processing communities …

Knowledge base graph embedding module design for Visual question answering model

W Zheng, L Yin, X Chen, Z Ma, S Liu, B Yang - Pattern recognition, 2021 - Elsevier
In this paper, a knowledge base graph embedding module is constructed to extend the
versatility of knowledge-based VQA (Visual Question Answering) models. The knowledge …

Don't just assume; look and answer: Overcoming priors for visual question answering

A Agrawal, D Batra, D Parikh… - Proceedings of the …, 2018 - openaccess.thecvf.com
A number of studies have found that today's Visual Question Answering (VQA) models are
heavily driven by superficial correlations in the training data and lack sufficient image …

Tips and tricks for visual question answering: Learnings from the 2017 challenge

D Teney, P Anderson, X He… - Proceedings of the …, 2018 - openaccess.thecvf.com
This paper presents a state-of-the-art model for visual question answering (VQA), which won
the first place in the 2017 VQA Challenge. VQA is a task of significant importance for …

Fvqa: Fact-based visual question answering

P Wang, Q Wu, C Shen, A Dick… - IEEE transactions on …, 2017 - ieeexplore.ieee.org
Visual Question Answering (VQA) has attracted much attention in both computer vision and
natural language processing communities, not least because it offers insight into the …

Graph-structured representations for visual question answering

D Teney, L Liu… - Proceedings of the IEEE …, 2017 - openaccess.thecvf.com
This paper proposes to improve visual question answering (VQA) with structured
representations of both scene contents and questions. A key challenge in VQA is to require …

Out of the box: Reasoning with graph convolution nets for factual visual question answering

M Narasimhan, S Lazebnik… - Advances in neural …, 2018 - proceedings.neurips.cc
Accurately answering a question about a given image requires combining observations with
general knowledge. While this is effortless for humans, reasoning with general knowledge …

Image captioning and visual question answering based on attributes and external knowledge

Q Wu, C Shen, P Wang, A Dick… - IEEE transactions on …, 2017 - ieeexplore.ieee.org
Much of the recent progress in Vision-to-Language problems has been achieved through a
combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks …

Analyzing the behavior of visual question answering models

A Agrawal, D Batra, D Parikh - arXiv preprint arXiv:1606.07356, 2016 - arxiv.org
Recently, a number of deep-learning based models have been proposed for the task of
Visual Question Answering (VQA). The performance of most models is clustered around 60 …

Simple baseline for visual question answering

B Zhou, Y Tian, S Sukhbaatar, A Szlam… - arXiv preprint arXiv …, 2015 - arxiv.org
We describe a very simple bag-of-words baseline for visual question answering. This
baseline concatenates the word features from the question and CNN features from the …