Comprehensive image captioning via scene graph decomposition

Y Zhong, L Wang, J Chen, D Yu, Y Li - … , Glasgow, UK, August 23–28, 2020 …, 2020 - Springer
We address the challenging problem of image captioning by revisiting the representation of
image scene graph. At the core of our method lies the decomposition of a scene graph into a …

Show, control and tell: A framework for generating controllable and grounded captions

M Cornia, L Baraldi… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
Current captioning approaches can describe images using black-box architectures whose
behavior is hardly controllable and explainable from the exterior. As an image can be …

Trends in integration of vision and language research: A survey of tasks, datasets, and methods

A Mogadala, M Kalimuthu, D Klakow - Journal of Artificial Intelligence …, 2021 - jair.org
Abstract Interest in Artificial Intelligence (AI) and its applications has seen unprecedented
growth in the last few years. This success can be partly attributed to the advancements made …

Image captioning model using attention and object features to mimic human image understanding

MA Al-Malla, A Jafar, N Ghneim - Journal of Big Data, 2022 - Springer
Image captioning spans the fields of computer vision and natural language processing. The
image captioning task generalizes object detection where the descriptions are a single …

Fast, diverse and accurate image captioning guided by part-of-speech

A Deshpande, J Aneja, L Wang… - Proceedings of the …, 2019 - openaccess.thecvf.com
Image captioning is an ambiguous problem, with many suitable captions for an image. To
address ambiguity, beam search is the de facto method for sampling multiple captions …

Like hiking? you probably enjoy nature: Persona-grounded dialog with commonsense expansions

BP Majumder, H Jhamtani, T Berg-Kirkpatrick… - arXiv preprint arXiv …, 2020 - arxiv.org
Existing persona-grounded dialog models often fail to capture simple implications of given
persona descriptions, something which humans are able to do seamlessly. For example …

Distilling translations with visual awareness

J Ive, P Madhyastha, L Specia - arXiv preprint arXiv:1906.07701, 2019 - arxiv.org
Previous work on multimodal machine translation has shown that visual information is only
needed in very specific cases, for example in the presence of ambiguous words where the …

MSCTD: A multimodal sentiment chat translation dataset

Y Liang, F Meng, J Xu, Y Chen, J Zhou - arXiv preprint arXiv:2202.13645, 2022 - arxiv.org
Multimodal machine translation and textual chat translation have received considerable
attention in recent years. Although the conversation in its natural form is usually multimodal …

Image captioning based on scene graphs: A survey

J Jia, X Ding, S Pang, X Gao, X Xin, R Hu… - Expert Systems with …, 2023 - Elsevier
Although recent developments in deep learning have brought several tasks closer to human
performance, there is still a significant gap between human and machine performance in …

ShapeCaptioner: Generative caption network for 3D shapes by learning a mapping from parts detected in multiple views to sentences

Z Han, C Chen, YS Liu, M Zwicker - Proceedings of the 28th ACM …, 2020 - dl.acm.org
3D shape captioning is a challenging application in 3D shape understanding. Captions from
recent multi-view based methods reveal that they cannot capture part-level characteristics of …