Counterfactual vqa: A cause-effect look at language bias

Y Niu, K Tang, H Zhang, Z Lu… - Proceedings of the …, 2021 - openaccess.thecvf.com
Recent VQA models may tend to rely on language bias as a shortcut and thus fail to
sufficiently learn the multi-modal knowledge from both vision and language. In this paper …

Rethinking data augmentation for robust visual question answering

L Chen, Y Zheng, J Xiao - European conference on computer vision, 2022 - Springer
Data Augmentation (DA)—generating extra training samples beyond the original training set—
has been widely-used in today's unbiased VQA models to mitigate language biases. Current …

Debiased visual question answering from feature and sample perspectives

Z Wen, G Xu, M Tan, Q Wu… - Advances in Neural …, 2021 - proceedings.neurips.cc
Visual question answering (VQA) is designed to examine the visual-textual reasoning ability
of an intelligent agent. However, recent observations show that many VQA models may only …

Introspective distillation for robust question answering

Y Niu, H Zhang - Advances in Neural Information …, 2021 - proceedings.neurips.cc
Question answering (QA) models are well-known to exploit data bias, eg, the language prior
in visual QA and the position bias in reading comprehension. Recent debiasing methods …

Language bias in visual question answering: A survey and taxonomy

D Yuan - arXiv preprint arXiv:2111.08531, 2021 - arxiv.org
Visual question answering (VQA) is a challenging task, which has attracted more and more
attention in the field of computer vision and natural language processing. However, the …

Visual commonsense in pretrained unimodal and multimodal models

C Zhang, B Van Durme, Z Li… - arXiv preprint arXiv …, 2022 - arxiv.org
Our commonsense knowledge about objects includes their typical visual attributes; we know
that bananas are typically yellow or green, and not purple. Text and image corpora, being …

Test-time model adaptation for visual question answering with debiased self-supervisions

Z Wen, S Niu, G Li, Q Wu, M Tan… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Visual question answering (VQA) is a prevalent task in real-world, and plays an essential
role in helping the blind understand the physical world. However, due to the real-world …

Generative bias for robust visual question answering

JW Cho, DJ Kim, H Ryu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Abstract The task of Visual Question Answering (VQA) is known to be plagued by the issue
of VQA models exploiting biases within the dataset to make its final prediction. Various …

Loss re-scaling VQA: Revisiting the language prior problem from a class-imbalance view

Y Guo, L Nie, Z Cheng, Q Tian… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Recent studies have pointed out that many well-developed Visual Question Answering
(VQA) models are heavily affected by the language prior problem. It refers to making …

Coca: Collaborative causal regularization for audio-visual question answering

M Lao, N Pu, Y Liu, K He, EM Bakker… - Proceedings of the AAAI …, 2023 - ojs.aaai.org
Abstract Audio-Visual Question Answering (AVQA) is a sophisticated QA task, which aims at
answering textual questions over given video-audio pairs with comprehensive multimodal …