Counterfactual vqa: A cause-effect look at language bias
Recent VQA models may tend to rely on language bias as a shortcut and thus fail to
sufficiently learn the multi-modal knowledge from both vision and language. In this paper …
sufficiently learn the multi-modal knowledge from both vision and language. In this paper …
Rethinking data augmentation for robust visual question answering
Data Augmentation (DA)—generating extra training samples beyond the original training set—
has been widely-used in today's unbiased VQA models to mitigate language biases. Current …
has been widely-used in today's unbiased VQA models to mitigate language biases. Current …
Debiased visual question answering from feature and sample perspectives
Visual question answering (VQA) is designed to examine the visual-textual reasoning ability
of an intelligent agent. However, recent observations show that many VQA models may only …
of an intelligent agent. However, recent observations show that many VQA models may only …
Introspective distillation for robust question answering
Question answering (QA) models are well-known to exploit data bias, eg, the language prior
in visual QA and the position bias in reading comprehension. Recent debiasing methods …
in visual QA and the position bias in reading comprehension. Recent debiasing methods …
Language bias in visual question answering: A survey and taxonomy
D Yuan - arXiv preprint arXiv:2111.08531, 2021 - arxiv.org
Visual question answering (VQA) is a challenging task, which has attracted more and more
attention in the field of computer vision and natural language processing. However, the …
attention in the field of computer vision and natural language processing. However, the …
Visual commonsense in pretrained unimodal and multimodal models
Our commonsense knowledge about objects includes their typical visual attributes; we know
that bananas are typically yellow or green, and not purple. Text and image corpora, being …
that bananas are typically yellow or green, and not purple. Text and image corpora, being …
Test-time model adaptation for visual question answering with debiased self-supervisions
Visual question answering (VQA) is a prevalent task in real-world, and plays an essential
role in helping the blind understand the physical world. However, due to the real-world …
role in helping the blind understand the physical world. However, due to the real-world …
Generative bias for robust visual question answering
Abstract The task of Visual Question Answering (VQA) is known to be plagued by the issue
of VQA models exploiting biases within the dataset to make its final prediction. Various …
of VQA models exploiting biases within the dataset to make its final prediction. Various …
Loss re-scaling VQA: Revisiting the language prior problem from a class-imbalance view
Recent studies have pointed out that many well-developed Visual Question Answering
(VQA) models are heavily affected by the language prior problem. It refers to making …
(VQA) models are heavily affected by the language prior problem. It refers to making …
Coca: Collaborative causal regularization for audio-visual question answering
Abstract Audio-Visual Question Answering (AVQA) is a sophisticated QA task, which aims at
answering textual questions over given video-audio pairs with comprehensive multimodal …
answering textual questions over given video-audio pairs with comprehensive multimodal …