X-ggm: Graph generative modeling for out-of-distribution generalization in visual question answering
Encouraging progress has been made towards Visual Question Answering (VQA) in recent
years, but it is still challenging to enable VQA models to adaptively generalize to out-of …
years, but it is still challenging to enable VQA models to adaptively generalize to out-of …
Show, ask, attend, and answer: A strong baseline for visual question answering
V Kazemi, A Elqursh - arXiv preprint arXiv:1704.03162, 2017 - arxiv.org
This paper presents a new baseline for visual question answering task. Given an image and
a question in natural language, our model produces accurate answers according to the …
a question in natural language, our model produces accurate answers according to the …
Collaborative Modality Fusion for Mitigating Language Bias in Visual Question Answering
Q Lu, S Chen, X Zhu - Journal of Imaging, 2024 - mdpi.com
Language bias stands as a noteworthy concern in visual question answering (VQA),
wherein models tend to rely on spurious correlations between questions and answers for …
wherein models tend to rely on spurious correlations between questions and answers for …
Language bias in visual question answering: A survey and taxonomy
D Yuan - arXiv preprint arXiv:2111.08531, 2021 - arxiv.org
Visual question answering (VQA) is a challenging task, which has attracted more and more
attention in the field of computer vision and natural language processing. However, the …
attention in the field of computer vision and natural language processing. However, the …
Improving visual question answering with pre-trained language modeling
Y Wu, H Gao, L Chen - Fifth International Workshop on Pattern …, 2020 - spiedigitallibrary.org
Visual question answering is a task of significant importance for research in artificial
intelligence. However, most studies often use simple gated recurrent units (GRU) to extract …
intelligence. However, most studies often use simple gated recurrent units (GRU) to extract …
Overcoming language priors in VQA via adding visual module
J Zhao, X Zhang, X Wang, Y Yang, G Sun - Neural Computing and …, 2022 - Springer
Abstract Visual Question Answering (VQA) is a new and popular research direction. Dealing
with language prior problems has become a hot topic in VQA in the past two years. With the …
with language prior problems has become a hot topic in VQA in the past two years. With the …
Rubi: Reducing unimodal biases for visual question answering
Abstract Visual Question Answering (VQA) is the task of answering questions about an
image. Some VQA models often exploit unimodal biases to provide the correct answer …
image. Some VQA models often exploit unimodal biases to provide the correct answer …
A Visual Question Answering Network Merging High-and Low-Level Semantic Information
Visual Question Answering (VQA) usually uses deep attention mechanisms to learn fine-
grained visual content of images and textual content of questions. However, the deep …
grained visual content of images and textual content of questions. However, the deep …
Self-PT: Adaptive Self-Prompt Tuning for Low-Resource Visual Question Answering
B Yuan, S You, BK Bao - Proceedings of the 31st ACM International …, 2023 - dl.acm.org
Pretraining and finetuning large vision-language models (VLMs) have achieved remarkable
success in visual question answering (VQA). However, finetuning VLMs requires heavy …
success in visual question answering (VQA). However, finetuning VLMs requires heavy …
Visual perturbation-aware collaborative learning for overcoming the language prior problem
Several studies have recently pointed that existing Visual Question Answering (VQA)
models heavily suffer from the language prior problem, which refers to capturing superficial …
models heavily suffer from the language prior problem, which refers to capturing superficial …