Language bias in visual question answering: A survey and taxonomy

D Yuan - arXiv preprint arXiv:2111.08531, 2021 - arxiv.org
Visual question answering (VQA) is a challenging task, which has attracted more and more
attention in the field of computer vision and natural language processing. However, the …

Test-time model adaptation for visual question answering with debiased self-supervisions

Z Wen, S Niu, G Li, Q Wu, M Tan… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Visual question answering (VQA) is a prevalent task in real-world, and plays an essential
role in helping the blind understand the physical world. However, due to the real-world …

See and learn more: Dense caption-aware representation for visual question answering

Y Bi, H Jiang, Y Hu, Y Sun, B Yin - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
With the rapid development of deep learning models, great improvements have been
achieved in the Visual Question Answering (VQA) field. However, modern VQA models are …

A critical analysis of benchmarks, techniques, and models in medical visual question answering

S Al-Hadhrami, MEB Menai, S Al-Ahmadi… - IEEE …, 2023 - ieeexplore.ieee.org
This paper comprehensively reviews medical VQA models, structures, and datasets,
focusing on combining vision and language. Over 75 models and their statistical and SWOT …

Dbiased-p: Dual-biased predicate predictor for unbiased scene graph generation

X Han, X Song, X Dong, Y Wei, M Liu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Scene Graph Generation (SGG) is to abstract the objects and their semantic relationships
within a given image. Current SGG performance is mainly limited by the biased predicate …

Question-conditioned debiasing with focal visual context fusion for visual question answering

J Liu, GX Wang, CF Fan, F Zhou, HJ Xu - Knowledge-Based Systems, 2023 - Elsevier
Abstract Existing Visual Question Answering models suffer from the language prior, where
the answers provided by the models overly rely on the correlations between questions and …

Recovering generalization via pre-training-like knowledge distillation for out-of-distribution visual question answering

Y Song, X Yang, Y Wang, C Xu - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
With the emergence of large-scale multi-modal foundation models, significant improvements
have been made towards Visual Question Answering (VQA) in recent years via the “Pre …

Overcoming language priors for visual question answering via loss rebalancing label and global context

R Cao, Z Li - Uncertainty in Artificial Intelligence, 2023 - proceedings.mlr.press
Despite the advances in Visual Question Answering (VQA), many VQA models currently
suffer from language priors (ie generating answers directly from questions without using …

Neural logic vision language explainer

X Yang, F Liu, G Lin - IEEE Transactions on Multimedia, 2023 - ieeexplore.ieee.org
If we compare how humans reason and how deep models reason, humans reason in a
symbolic manner with a formal language called logic, while most deep models reason in …

LOIS: looking out of instance semantics for visual question answering

S Zhang, Y Chen, Y Sun, F Wang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Visual question answering (VQA) has been intensively studied as a multimodal task,
requiring efforts to bridge vision and language for correct answer inference. Recent attempts …