Generative bias for robust visual question answering

F Zhao, C Zhang, B Geng - ACM Computing Surveys, 2024 - dl.acm.org

Multimodal Artificial Intelligence (Multimodal AI), in general, involves various types of data
(eg, images, texts, or data collected from different sensors), feature engineering (eg …

被引用次数：30 相关文章

[PDF] arxiv.org

Contrastive region guidance: Improving grounding in vision-language models without training

D Wan, J Cho, E Stengel-Eskin, M Bansal - European Conference on …, 2025 - Springer

Highlighting particularly relevant regions of an image can improve the performance of vision-
language models (VLMs) on various vision-language (VL) tasks by guiding the model to …

被引用次数：17 相关文章所有 2 个版本

[HTML] sciencedirect.com

[HTML][HTML] A comprehensive survey on answer generation methods using NLP

P Upadhyay, R Agarwal, S Dhiman, A Sarkar… - Natural Language …, 2024 - Elsevier

Recent advancements in question-answering systems have significantly enhanced the
capability of computers to understand and respond to queries in natural language. This …

被引用次数：2 相关文章

Question-conditioned debiasing with focal visual context fusion for visual question answering

J Liu, GX Wang, CF Fan, F Zhou, HJ Xu - Knowledge-Based Systems, 2023 - Elsevier

Abstract Existing Visual Question Answering models suffer from the language prior, where
the answers provided by the models overly rely on the correlations between questions and …

被引用次数：6 相关文章所有 3 个版本

[PDF] arxiv.org

Signing outside the studio: Benchmarking background robustness for continuous sign language recognition

Y Jang, Y Oh, JW Cho, DJ Kim, JS Chung… - arXiv preprint arXiv …, 2022 - arxiv.org

The goal of this work is background-robust continuous sign language recognition. Most
existing Continuous Sign Language Recognition (CSLR) benchmarks have fixed …

被引用次数：14 相关文章所有 8 个版本

Enhancing robust VQA via contrastive and self-supervised learning

R Cao, Z Li, Z Tang, C Zhang, H Ma - Pattern Recognition, 2025 - Elsevier

Abstract Visual Question Answering (VQA) aims to evaluate the reasoning abilities of an
intelligent agent using visual and textual information. However, recent research indicates …

被引用次数：1 相关文章所有 2 个版本

Robust visual question answering via polarity enhancement and contrast

D Peng, Z Li - Neural Networks, 2024 - Elsevier

Abstract The Visual Question Answering (VQA) task is an important research direction in the
field of artificial intelligence, which requires a model that can simultaneously understand …

被引用次数：2 相关文章所有 4 个版本

Robust Visual Question Answering utilizing Bias Instances and Label Imbalance

L Zhao, K Li, J Qi, Y Sun, Z Zhu - Knowledge-Based Systems, 2024 - Elsevier

Abstract Visual Question Answering (VQA) models often suffer from bias issues which cause
their predictions to rely on superficial correlations in datasets rather than the intrinsic …

被引用次数：1 相关文章

[PDF] acm.org

Towards Deconfounded Visual Question Answering via Dual-causal Intervention

D Peng, W Wei - Proceedings of the 33rd ACM International Conference …, 2024 - dl.acm.org

The Visual Question Answering (VQA) task has recently become notorious because models
are prone to predicting well-educated" guesses" as answers rather than deriving them …

被引用次数：1 相关文章

[PDF] ieee.org

Counterfactual Mix-up for visual question answering

JW Cho, DJ Kim, Y Jung, IS Kweon - IEEE Access, 2023 - ieeexplore.ieee.org

Counterfactuals have been shown to be a powerful method in Visual Question Answering in
the alleviation of Visual Question Answering's unimodal bias. However, existing …

被引用次数：4 相关文章所有 3 个版本