Semi-supervised image captioning by adversarially propagating labeled data
We present a novel data-efficient semi-supervised framework to improve the generalization
of image captioning models. Constructing a large-scale labeled image captioning dataset is …
of image captioning models. Constructing a large-scale labeled image captioning dataset is …
Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality
In this paper, we propose a new method to enhance compositional understanding in pre-
trained vision and language models (VLMs) without sacrificing performance in zero-shot …
trained vision and language models (VLMs) without sacrificing performance in zero-shot …
Unbiased Visual Question Answering by Leveraging Instrumental Variable
Y Pan, J Liu, L Jin, Z Li - IEEE Transactions on Multimedia, 2024 - ieeexplore.ieee.org
Existing unbiased visual question answering (VQA) models reduce the spurious correlation
between questions and answers to force the models to focus on visual information …
between questions and answers to force the models to focus on visual information …
Enhanced Visual Question Answering System Using DenseNet
S Nithish, EM Kawinbalaji… - … on Advances in Data …, 2024 - ieeexplore.ieee.org
Visual Question Answering (VQA) system represents an essential usage of computer vision
and natural language processing, enabling machines to understand and react to inquiries …
and natural language processing, enabling machines to understand and react to inquiries …