Rethinking data augmentation for robust visual question answering

L Chen, Y Zheng, J Xiao - European conference on computer vision, 2022 - Springer
Data Augmentation (DA)—generating extra training samples beyond the original training set—
has been widely-used in today's unbiased VQA models to mitigate language biases. Current …

Language bias in visual question answering: A survey and taxonomy

D Yuan - arXiv preprint arXiv:2111.08531, 2021 - arxiv.org
Visual question answering (VQA) is a challenging task, which has attracted more and more
attention in the field of computer vision and natural language processing. However, the …

A critical analysis of benchmarks, techniques, and models in medical visual question answering

S Al-Hadhrami, MEB Menai, S Al-Ahmadi… - IEEE …, 2023 - ieeexplore.ieee.org
This paper comprehensively reviews medical VQA models, structures, and datasets,
focusing on combining vision and language. Over 75 models and their statistical and SWOT …

Towards robust visual question answering: Making the most of biased samples via contrastive learning

Q Si, Y Liu, F Meng, Z Lin, P Fu, Y Cao, W Wang… - arXiv preprint arXiv …, 2022 - arxiv.org
Models for Visual Question Answering (VQA) often rely on the spurious correlations, ie, the
language priors, that appear in the biased samples of training set, which make them brittle …

Be flexible! learn to debias by sampling and prompting for robust visual question answering

J Liu, CF Fan, F Zhou, H Xu - Information Processing & Management, 2023 - Elsevier
Recent studies point out that VQA models tend to rely on the language prior in the training
data to answer the questions, which prevents the VQA model from generalization on the out …

Question-conditioned debiasing with focal visual context fusion for visual question answering

J Liu, GX Wang, CF Fan, F Zhou, HJ Xu - Knowledge-Based Systems, 2023 - Elsevier
Abstract Existing Visual Question Answering models suffer from the language prior, where
the answers provided by the models overly rely on the correlations between questions and …

Language prior is not the only shortcut: A benchmark for shortcut learning in vqa

Q Si, F Meng, M Zheng, Z Lin, Y Liu, P Fu… - arXiv preprint arXiv …, 2022 - arxiv.org
Visual Question Answering (VQA) models are prone to learn the shortcut solution formed by
dataset biases rather than the intended solution. To evaluate the VQA models' reasoning …

Digging out discrimination information from generated samples for robust visual question answering

Z Wen, Y Wang, M Tan, Q Wu, Q Wu - Findings of the Association …, 2023 - aclanthology.org
Abstract Visual Question Answering (VQA) aims to answer a textual question based on a
given image. Nevertheless, recent studies have shown that VQA models tend to capture the …

Overcoming language priors for visual question answering via loss rebalancing label and global context

R Cao, Z Li - Uncertainty in Artificial Intelligence, 2023 - proceedings.mlr.press
Despite the advances in Visual Question Answering (VQA), many VQA models currently
suffer from language priors (ie generating answers directly from questions without using …

Robust visual question answering: Datasets, methods, and future challenges

J Ma, P Wang, D Kong, Z Wang, J Liu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Visual question answering requires a system to provide an accurate natural language
answer given an image and a natural language question. However, it is widely recognized …