X-ggm: Graph generative modeling for out-of-distribution generalization in visual question answering

J Jiang, Z Liu, Y Liu, Z Nan, N Zheng - Proceedings of the 29th ACM …, 2021 - dl.acm.org
Encouraging progress has been made towards Visual Question Answering (VQA) in recent
years, but it is still challenging to enable VQA models to adaptively generalize to out-of …

Show, ask, attend, and answer: A strong baseline for visual question answering

V Kazemi, A Elqursh - arXiv preprint arXiv:1704.03162, 2017 - arxiv.org
This paper presents a new baseline for visual question answering task. Given an image and
a question in natural language, our model produces accurate answers according to the …

Collaborative Modality Fusion for Mitigating Language Bias in Visual Question Answering

Q Lu, S Chen, X Zhu - Journal of Imaging, 2024 - mdpi.com
Language bias stands as a noteworthy concern in visual question answering (VQA),
wherein models tend to rely on spurious correlations between questions and answers for …

Language bias in visual question answering: A survey and taxonomy

D Yuan - arXiv preprint arXiv:2111.08531, 2021 - arxiv.org
Visual question answering (VQA) is a challenging task, which has attracted more and more
attention in the field of computer vision and natural language processing. However, the …

Improving visual question answering with pre-trained language modeling

Y Wu, H Gao, L Chen - Fifth International Workshop on Pattern …, 2020 - spiedigitallibrary.org
Visual question answering is a task of significant importance for research in artificial
intelligence. However, most studies often use simple gated recurrent units (GRU) to extract …

Overcoming language priors in VQA via adding visual module

J Zhao, X Zhang, X Wang, Y Yang, G Sun - Neural Computing and …, 2022 - Springer
Abstract Visual Question Answering (VQA) is a new and popular research direction. Dealing
with language prior problems has become a hot topic in VQA in the past two years. With the …

Rubi: Reducing unimodal biases for visual question answering

R Cadene, C Dancette, M Cord… - Advances in neural …, 2019 - proceedings.neurips.cc
Abstract Visual Question Answering (VQA) is the task of answering questions about an
image. Some VQA models often exploit unimodal biases to provide the correct answer …

A Visual Question Answering Network Merging High-and Low-Level Semantic Information

H Li, D Han, C Chen, CC Chang, KC Li… - … on Information and …, 2023 - search.ieice.org
Visual Question Answering (VQA) usually uses deep attention mechanisms to learn fine-
grained visual content of images and textual content of questions. However, the deep …

Self-PT: Adaptive Self-Prompt Tuning for Low-Resource Visual Question Answering

B Yuan, S You, BK Bao - Proceedings of the 31st ACM International …, 2023 - dl.acm.org
Pretraining and finetuning large vision-language models (VLMs) have achieved remarkable
success in visual question answering (VQA). However, finetuning VLMs requires heavy …

Visual perturbation-aware collaborative learning for overcoming the language prior problem

Y Han, L Nie, J Yin, J Wu, Y Yan - arXiv preprint arXiv:2207.11850, 2022 - arxiv.org
Several studies have recently pointed that existing Visual Question Answering (VQA)
models heavily suffer from the language prior problem, which refers to capturing superficial …