Semi-supervised image captioning by adversarially propagating labeled data

DJ Kim, TH Oh, J Choi, IS Kweon - IEEE Access, 2024 - ieeexplore.ieee.org
We present a novel data-efficient semi-supervised framework to improve the generalization
of image captioning models. Constructing a large-scale labeled image captioning dataset is …

Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality

Y Oh, JW Cho, DJ Kim, IS Kweon, J Kim - arXiv preprint arXiv:2410.05210, 2024 - arxiv.org
In this paper, we propose a new method to enhance compositional understanding in pre-
trained vision and language models (VLMs) without sacrificing performance in zero-shot …

Unbiased Visual Question Answering by Leveraging Instrumental Variable

Y Pan, J Liu, L Jin, Z Li - IEEE Transactions on Multimedia, 2024 - ieeexplore.ieee.org
Existing unbiased visual question answering (VQA) models reduce the spurious correlation
between questions and answers to force the models to focus on visual information …

Enhanced Visual Question Answering System Using DenseNet

S Nithish, EM Kawinbalaji… - … on Advances in Data …, 2024 - ieeexplore.ieee.org
Visual Question Answering (VQA) system represents an essential usage of computer vision
and natural language processing, enabling machines to understand and react to inquiries …