Entity-aware image caption generation

D Lu, S Whitehead, L Huang, H Ji… - arXiv preprint arXiv …, 2018 - arxiv.org
arXiv preprint arXiv:1804.07889, 2018arxiv.org
Current image captioning approaches generate descriptions which lack specific information,
such as named entities that are involved in the images. In this paper we propose a new task
which aims to generate informative image captions, given images and hashtags as input.
We propose a simple but effective approach to tackle this problem. We first train a
convolutional neural networks-long short term memory networks (CNN-LSTM) model to
generate a template caption based on the input image. Then we use a knowledge graph …
Current image captioning approaches generate descriptions which lack specific information, such as named entities that are involved in the images. In this paper we propose a new task which aims to generate informative image captions, given images and hashtags as input. We propose a simple but effective approach to tackle this problem. We first train a convolutional neural networks - long short term memory networks (CNN-LSTM) model to generate a template caption based on the input image. Then we use a knowledge graph based collective inference algorithm to fill in the template with specific named entities retrieved via the hashtags. Experiments on a new benchmark dataset collected from Flickr show that our model generates news-style image descriptions with much richer information. Our model outperforms unimodal baselines significantly with various evaluation metrics.
arxiv.org
以上显示的是最相近的搜索结果。 查看全部搜索结果