Overcoming language priors with self-supervised learning for visual question answering

Y Niu, K Tang, H Zhang, Z Lu… - Proceedings of the …, 2021 - openaccess.thecvf.com

Recent VQA models may tend to rely on language bias as a shortcut and thus fail to
sufficiently learn the multi-modal knowledge from both vision and language. In this paper …

被引用次数：388 相关文章所有 7 个版本

[PDF] arxiv.org

Rethinking data augmentation for robust visual question answering

L Chen, Y Zheng, J Xiao - European conference on computer vision, 2022 - Springer

Data Augmentation (DA)—generating extra training samples beyond the original training set—
has been widely-used in today's unbiased VQA models to mitigate language biases. Current …

被引用次数：40 相关文章所有 5 个版本

[PDF] neurips.cc

Debiased visual question answering from feature and sample perspectives

Z Wen, G Xu, M Tan, Q Wu… - Advances in Neural …, 2021 - proceedings.neurips.cc

Visual question answering (VQA) is designed to examine the visual-textual reasoning ability
of an intelligent agent. However, recent observations show that many VQA models may only …

被引用次数：54 相关文章所有 9 个版本

[PDF] neurips.cc

Introspective distillation for robust question answering

Y Niu, H Zhang - Advances in Neural Information …, 2021 - proceedings.neurips.cc

Question answering (QA) models are well-known to exploit data bias, eg, the language prior
in visual QA and the position bias in reading comprehension. Recent debiasing methods …

被引用次数：52 相关文章所有 7 个版本

[PDF] arxiv.org

Language bias in visual question answering: A survey and taxonomy

D Yuan - arXiv preprint arXiv:2111.08531, 2021 - arxiv.org

Visual question answering (VQA) is a challenging task, which has attracted more and more
attention in the field of computer vision and natural language processing. However, the …

被引用次数：15 相关文章所有 2 个版本

[PDF] arxiv.org

Visual commonsense in pretrained unimodal and multimodal models

C Zhang, B Van Durme, Z Li… - arXiv preprint arXiv …, 2022 - arxiv.org

Our commonsense knowledge about objects includes their typical visual attributes; we know
that bananas are typically yellow or green, and not purple. Text and image corpora, being …

被引用次数：37 相关文章所有 5 个版本

Test-time model adaptation for visual question answering with debiased self-supervisions

Z Wen, S Niu, G Li, Q Wu, M Tan… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Visual question answering (VQA) is a prevalent task in real-world, and plays an essential
role in helping the blind understand the physical world. However, due to the real-world …

被引用次数：11 相关文章所有 2 个版本

[PDF] thecvf.com

Generative bias for robust visual question answering

JW Cho, DJ Kim, H Ryu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Abstract The task of Visual Question Answering (VQA) is known to be plagued by the issue
of VQA models exploiting biases within the dataset to make its final prediction. Various …

被引用次数：19 相关文章所有 10 个版本

[PDF] arxiv.org

Loss re-scaling VQA: Revisiting the language prior problem from a class-imbalance view

Y Guo, L Nie, Z Cheng, Q Tian… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

Recent studies have pointed out that many well-developed Visual Question Answering
(VQA) models are heavily affected by the language prior problem. It refers to making …

被引用次数：55 相关文章所有 8 个版本

[PDF] aaai.org

Coca: Collaborative causal regularization for audio-visual question answering

M Lao, N Pu, Y Liu, K He, EM Bakker… - Proceedings of the AAAI …, 2023 - ojs.aaai.org

Abstract Audio-Visual Question Answering (AVQA) is a sophisticated QA task, which aims at
answering textual questions over given video-audio pairs with comprehensive multimodal …

被引用次数：10 相关文章所有 2 个版本