Suppressing biased samples for robust VQA

D Yuan - arXiv preprint arXiv:2111.08531, 2021 - arxiv.org

Visual question answering (VQA) is a challenging task, which has attracted more and more
attention in the field of computer vision and natural language processing. However, the …

被引用次数：15 相关文章所有 2 个版本

Test-time model adaptation for visual question answering with debiased self-supervisions

Z Wen, S Niu, G Li, Q Wu, M Tan… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Visual question answering (VQA) is a prevalent task in real-world, and plays an essential
role in helping the blind understand the physical world. However, due to the real-world …

被引用次数：11 相关文章所有 2 个版本

See and learn more: Dense caption-aware representation for visual question answering

Y Bi, H Jiang, Y Hu, Y Sun, B Yin - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

With the rapid development of deep learning models, great improvements have been
achieved in the Visual Question Answering (VQA) field. However, modern VQA models are …

被引用次数：8 相关文章所有 2 个版本

[PDF] ieee.org

A critical analysis of benchmarks, techniques, and models in medical visual question answering

S Al-Hadhrami, MEB Menai, S Al-Ahmadi… - IEEE …, 2023 - ieeexplore.ieee.org

This paper comprehensively reviews medical VQA models, structures, and datasets,
focusing on combining vision and language. Over 75 models and their statistical and SWOT …

被引用次数：1 相关文章所有 2 个版本

Dbiased-p: Dual-biased predicate predictor for unbiased scene graph generation

X Han, X Song, X Dong, Y Wei, M Liu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Scene Graph Generation (SGG) is to abstract the objects and their semantic relationships
within a given image. Current SGG performance is mainly limited by the biased predicate …

被引用次数：13 相关文章所有 3 个版本

Question-conditioned debiasing with focal visual context fusion for visual question answering

J Liu, GX Wang, CF Fan, F Zhou, HJ Xu - Knowledge-Based Systems, 2023 - Elsevier

Abstract Existing Visual Question Answering models suffer from the language prior, where
the answers provided by the models overly rely on the correlations between questions and …

被引用次数：4 相关文章所有 3 个版本

Recovering generalization via pre-training-like knowledge distillation for out-of-distribution visual question answering

Y Song, X Yang, Y Wang, C Xu - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

With the emergence of large-scale multi-modal foundation models, significant improvements
have been made towards Visual Question Answering (VQA) in recent years via the “Pre …

被引用次数：7 相关文章所有 2 个版本

[PDF] mlr.press

Overcoming language priors for visual question answering via loss rebalancing label and global context

R Cao, Z Li - Uncertainty in Artificial Intelligence, 2023 - proceedings.mlr.press

Despite the advances in Visual Question Answering (VQA), many VQA models currently
suffer from language priors (ie generating answers directly from questions without using …

被引用次数：2 相关文章所有 5 个版本

[PDF] ntu.edu.sg

Neural logic vision language explainer

X Yang, F Liu, G Lin - IEEE Transactions on Multimedia, 2023 - ieeexplore.ieee.org

If we compare how humans reason and how deep models reason, humans reason in a
symbolic manner with a formal language called logic, while most deep models reason in …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

LOIS: looking out of instance semantics for visual question answering

S Zhang, Y Chen, Y Sun, F Wang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Visual question answering (VQA) has been intensively studied as a multimodal task,
requiring efforts to bridge vision and language for correct answer inference. Recent attempts …

被引用次数：2 相关文章所有 5 个版本