Contrastive region guidance: Improving grounding in vision-language models without training
Highlighting particularly relevant regions of an image can improve the performance of vision-
language models (VLMs) on various vision-language (VL) tasks by guiding the model to …
language models (VLMs) on various vision-language (VL) tasks by guiding the model to …
[HTML][HTML] A comprehensive survey on answer generation methods using NLP
P Upadhyay, R Agarwal, S Dhiman, A Sarkar… - Natural Language …, 2024 - Elsevier
Recent advancements in question-answering systems have significantly enhanced the
capability of computers to understand and respond to queries in natural language. This …
capability of computers to understand and respond to queries in natural language. This …
Question-conditioned debiasing with focal visual context fusion for visual question answering
J Liu, GX Wang, CF Fan, F Zhou, HJ Xu - Knowledge-Based Systems, 2023 - Elsevier
Abstract Existing Visual Question Answering models suffer from the language prior, where
the answers provided by the models overly rely on the correlations between questions and …
the answers provided by the models overly rely on the correlations between questions and …
Signing outside the studio: Benchmarking background robustness for continuous sign language recognition
The goal of this work is background-robust continuous sign language recognition. Most
existing Continuous Sign Language Recognition (CSLR) benchmarks have fixed …
existing Continuous Sign Language Recognition (CSLR) benchmarks have fixed …
Enhancing robust VQA via contrastive and self-supervised learning
Abstract Visual Question Answering (VQA) aims to evaluate the reasoning abilities of an
intelligent agent using visual and textual information. However, recent research indicates …
intelligent agent using visual and textual information. However, recent research indicates …
Robust visual question answering via polarity enhancement and contrast
D Peng, Z Li - Neural Networks, 2024 - Elsevier
Abstract The Visual Question Answering (VQA) task is an important research direction in the
field of artificial intelligence, which requires a model that can simultaneously understand …
field of artificial intelligence, which requires a model that can simultaneously understand …
Robust Visual Question Answering utilizing Bias Instances and Label Imbalance
L Zhao, K Li, J Qi, Y Sun, Z Zhu - Knowledge-Based Systems, 2024 - Elsevier
Abstract Visual Question Answering (VQA) models often suffer from bias issues which cause
their predictions to rely on superficial correlations in datasets rather than the intrinsic …
their predictions to rely on superficial correlations in datasets rather than the intrinsic …
Towards Deconfounded Visual Question Answering via Dual-causal Intervention
D Peng, W Wei - Proceedings of the 33rd ACM International Conference …, 2024 - dl.acm.org
The Visual Question Answering (VQA) task has recently become notorious because models
are prone to predicting well-educated" guesses" as answers rather than deriving them …
are prone to predicting well-educated" guesses" as answers rather than deriving them …
Counterfactual Mix-up for visual question answering
Counterfactuals have been shown to be a powerful method in Visual Question Answering in
the alleviation of Visual Question Answering's unimodal bias. However, existing …
the alleviation of Visual Question Answering's unimodal bias. However, existing …