Language bias in visual question answering: A survey and taxonomy
D Yuan - arXiv preprint arXiv:2111.08531, 2021 - arxiv.org
Visual question answering (VQA) is a challenging task, which has attracted more and more
attention in the field of computer vision and natural language processing. However, the …
attention in the field of computer vision and natural language processing. However, the …
Test-time model adaptation for visual question answering with debiased self-supervisions
Visual question answering (VQA) is a prevalent task in real-world, and plays an essential
role in helping the blind understand the physical world. However, due to the real-world …
role in helping the blind understand the physical world. However, due to the real-world …
See and learn more: Dense caption-aware representation for visual question answering
With the rapid development of deep learning models, great improvements have been
achieved in the Visual Question Answering (VQA) field. However, modern VQA models are …
achieved in the Visual Question Answering (VQA) field. However, modern VQA models are …
A critical analysis of benchmarks, techniques, and models in medical visual question answering
This paper comprehensively reviews medical VQA models, structures, and datasets,
focusing on combining vision and language. Over 75 models and their statistical and SWOT …
focusing on combining vision and language. Over 75 models and their statistical and SWOT …
Dbiased-p: Dual-biased predicate predictor for unbiased scene graph generation
Scene Graph Generation (SGG) is to abstract the objects and their semantic relationships
within a given image. Current SGG performance is mainly limited by the biased predicate …
within a given image. Current SGG performance is mainly limited by the biased predicate …
Question-conditioned debiasing with focal visual context fusion for visual question answering
J Liu, GX Wang, CF Fan, F Zhou, HJ Xu - Knowledge-Based Systems, 2023 - Elsevier
Abstract Existing Visual Question Answering models suffer from the language prior, where
the answers provided by the models overly rely on the correlations between questions and …
the answers provided by the models overly rely on the correlations between questions and …
Recovering generalization via pre-training-like knowledge distillation for out-of-distribution visual question answering
With the emergence of large-scale multi-modal foundation models, significant improvements
have been made towards Visual Question Answering (VQA) in recent years via the “Pre …
have been made towards Visual Question Answering (VQA) in recent years via the “Pre …
Overcoming language priors for visual question answering via loss rebalancing label and global context
R Cao, Z Li - Uncertainty in Artificial Intelligence, 2023 - proceedings.mlr.press
Despite the advances in Visual Question Answering (VQA), many VQA models currently
suffer from language priors (ie generating answers directly from questions without using …
suffer from language priors (ie generating answers directly from questions without using …
Neural logic vision language explainer
If we compare how humans reason and how deep models reason, humans reason in a
symbolic manner with a formal language called logic, while most deep models reason in …
symbolic manner with a formal language called logic, while most deep models reason in …
LOIS: looking out of instance semantics for visual question answering
S Zhang, Y Chen, Y Sun, F Wang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Visual question answering (VQA) has been intensively studied as a multimodal task,
requiring efforts to bridge vision and language for correct answer inference. Recent attempts …
requiring efforts to bridge vision and language for correct answer inference. Recent attempts …