Overcoming Language Priors in Visual Question Answering with Cumulative Learning Strategy

A Mao, F Chen, Z Ma, K Lin - Available at SSRN 4740502 - papers.ssrn.com
The performance of visual question answering (VQA) has witnessed great progress over the
last few years. However, many current VQA models tend to rely on superficial linguistic …

Overcoming language priors via shuffling language bias for robust visual question answering

J Zhao, Z Yu, X Zhang, Y Yang - IEEE Access, 2023 - ieeexplore.ieee.org
Recent research has revealed the notorious language prior problem in visual question
answering (VQA) tasks based on visual-textual interaction, which indicates that well …

Overcoming language priors with self-contrastive learning for visual question answering

H Yan, L Liu, X Feng, Q Huang - Multimedia Tools and Applications, 2023 - Springer
Although remarkable success has been achieved in the last few years on the Visual
Question Answer (VQA) task, most existing models are heavily driven by the surface …

StableNet: Distinguishing the hard samples to overcome language priors in visual question answering

Z Yu, J Zhao, C Guo, Y Yang - IET Computer Vision, 2024 - Wiley Online Library
With the booming fields of computer vision and natural language processing, cross‐modal
intersections such as visual question answering (VQA) have become very popular …

Overcoming language priors for visual question answering via loss rebalancing label and global context

R Cao, Z Li - Uncertainty in Artificial Intelligence, 2023 - proceedings.mlr.press
Despite the advances in Visual Question Answering (VQA), many VQA models currently
suffer from language priors (ie generating answers directly from questions without using …

Overcoming language priors with self-supervised learning for visual question answering

X Zhu, Z Mao, C Liu, P Zhang, B Wang… - arXiv preprint arXiv …, 2020 - arxiv.org
Most Visual Question Answering (VQA) models suffer from the language prior problem,
which is caused by inherent data biases. Specifically, VQA models tend to answer questions …

Deep Multi-Module Based Language Priors Mitigation Model for Visual Question Answering.

YU Shoujian, JIN Xueqin, WU Guowen… - Journal of Donghua …, 2023 - search.ebscohost.com
The original intention of visual question answering (VQA) models is to infer the answer
based on the relevant information of the question text in the visual image, but many VQA …

Towards robust visual question answering: Making the most of biased samples via contrastive learning

Q Si, Y Liu, F Meng, Z Lin, P Fu, Y Cao, W Wang… - arXiv preprint arXiv …, 2022 - arxiv.org
Models for Visual Question Answering (VQA) often rely on the spurious correlations, ie, the
language priors, that appear in the biased samples of training set, which make them brittle …

Regulating Balance Degree for More Reasonable Visual Question Answering Benchmark

K Lin, A Mao, J Liu - 2022 International Joint Conference on …, 2022 - ieeexplore.ieee.org
Superficial linguistic correlations is a critical issue for Visual Question Answering (VQA),
where models can achieve high performance by exploiting the connection between question …

From superficial to deep: Language bias driven curriculum learning for visual question answering

M Lao, Y Guo, Y Liu, W Chen, N Pu… - Proceedings of the 29th …, 2021 - dl.acm.org
Most Visual Question Answering (VQA) models are faced with language bias when learning
to answer a given question, thereby failing to understand multimodal knowledge …