Semi-supervised image captioning by adversarially propagating labeled data

DJ Kim, TH Oh, J Choi, IS Kweon - IEEE Access, 2024 - ieeexplore.ieee.org
IEEE Access, 2024ieeexplore.ieee.org
We present a novel data-efficient semi-supervised framework to improve the generalization
of image captioning models. Constructing a large-scale labeled image captioning dataset is
expensive in terms of labor, time, and cost. In contrast to manually annotating all the training
samples, separately collecting uni-modal datasets is immensely easier, eg., a large-scale
image dataset and a sentence dataset. We leverage such massive unpaired image and
caption data upon standard paired data by learning to associate them. To this end, our novel …
We present a novel data-efficient semi-supervised framework to improve the generalization of image captioning models. Constructing a large-scale labeled image captioning dataset is expensive in terms of labor, time, and cost. In contrast to manually annotating all the training samples, separately collecting uni-modal datasets is immensely easier, e.g ., a large-scale image dataset and a sentence dataset. We leverage such massive unpaired image and caption data upon standard paired data by learning to associate them. To this end, our novel semi-supervised learning method assigns pseudo-labels to unpaired images and captions in an adversarial learning fashion, where the joint distribution of image and caption is learned. This approach shows noticeable performance improvement even in challenging scenarios, including out-of-task data and web-crawled data. We also show that our proposed method is theoretically well-motivated and has a favorable global optimal property. Our extensive and comprehensive empirical results on captioning datasets, followed by a comprehensive analysis of the scarcely-paired COCO dataset, demonstrate the consistent effectiveness of our method compared to competing ones.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果