From images to textual prompts: Zero-shot visual question answering with frozen large language models
Large language models (LLMs) have demonstrated excellent zero-shot generalization to
new language tasks. However, effective utilization of LLMs for zero-shot visual question …
new language tasks. However, effective utilization of LLMs for zero-shot visual question …
Rethinking data augmentation for robust visual question answering
Data Augmentation (DA)—generating extra training samples beyond the original training set—
has been widely-used in today's unbiased VQA models to mitigate language biases. Current …
has been widely-used in today's unbiased VQA models to mitigate language biases. Current …
All you may need for vqa are image captions
S Changpinyo, D Kukliansky, I Szpektor… - arXiv preprint arXiv …, 2022 - arxiv.org
Visual Question Answering (VQA) has benefited from increasingly sophisticated models, but
has not enjoyed the same level of engagement in terms of data creation. In this paper, we …
has not enjoyed the same level of engagement in terms of data creation. In this paper, we …
From images to textual prompts: Zero-shot vqa with frozen large language models
Large language models (LLMs) have demonstrated excellent zero-shot generalization to
new language tasks. However, effective utilization of LLMs for zero-shot visual question …
new language tasks. However, effective utilization of LLMs for zero-shot visual question …
Counterfactual samples synthesizing and training for robust visual question answering
Today's VQA models still tend to capture superficial linguistic correlations in the training set
and fail to generalize to the test set with different QA distributions. To reduce these language …
and fail to generalize to the test set with different QA distributions. To reduce these language …
Debiased Visual Question Answering via the perspective of question types
Abstract Visual Question Answering (VQA) aims to answer questions according to the given
image. However, current VQA models tend to rely solely on textual information from the …
image. However, current VQA models tend to rely solely on textual information from the …
Digging out discrimination information from generated samples for robust visual question answering
Abstract Visual Question Answering (VQA) aims to answer a textual question based on a
given image. Nevertheless, recent studies have shown that VQA models tend to capture the …
given image. Nevertheless, recent studies have shown that VQA models tend to capture the …
Robust visual question answering: Datasets, methods, and future challenges
J Ma, P Wang, D Kong, Z Wang, J Liu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Visual question answering requires a system to provide an accurate natural language
answer given an image and a natural language question. However, it is widely recognized …
answer given an image and a natural language question. However, it is widely recognized …
Empirical study on using adapters for debiased Visual Question Answering
In this work, we empirically study debiased Visual Question Answering (VQA) works with
Adapters. Most VQA debiasing works sacrifice in-distribution (ID) performance for the sake of …
Adapters. Most VQA debiasing works sacrifice in-distribution (ID) performance for the sake of …
Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!
Finetuning a large vision language model (VLM) on a target dataset after large scale
pretraining is a dominant paradigm in visual question answering (VQA). Datasets for …
pretraining is a dominant paradigm in visual question answering (VQA). Datasets for …