Knowledge graphs meet multi-modal learning: A comprehensive survey

Z Chen, Y Zhang, Y Fang, Y Geng, L Guo… - arXiv preprint arXiv …, 2024 - arxiv.org
Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the
semantic web community's exploration into multi-modal dimensions unlocking new avenues …

Translation between molecules and natural language

C Edwards, T Lai, K Ros, G Honke, K Cho… - arXiv preprint arXiv …, 2022 - arxiv.org
We present $\textbf {MolT5} $$-$ a self-supervised learning framework for pretraining
models on a vast amount of unlabeled natural language text and molecule strings. $\textbf …

Ei-clip: Entity-aware interventional contrastive learning for e-commerce cross-modal retrieval

H Ma, H Zhao, Z Lin, A Kale, Z Wang… - Proceedings of the …, 2022 - openaccess.thecvf.com
Abstract recommendation, and marketing services. Extensive efforts have been made to
conquer the cross-modal retrieval problem in the general domain. When it comes to E …

Visual news: Benchmark and challenges in news image captioning

F Liu, Y Wang, T Wang, V Ordonez - arXiv preprint arXiv:2010.03743, 2020 - arxiv.org
We propose Visual News Captioner, an entity-aware model for the task of news image
captioning. We also introduce Visual News, a large-scale benchmark consisting of more …

Good news, everyone! context driven entity-aware captioning for news images

AF Biten, L Gomez, M Rusinol… - Proceedings of the …, 2019 - openaccess.thecvf.com
Current image captioning systems perform at a merely descriptive level, essentially
enumerating the objects in the scene and their relations. Humans, on the contrary, interpret …

NWPU-captions dataset and MLCA-net for remote sensing image captioning

Q Cheng, H Huang, Y Xu, Y Zhou, H Li… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Recently, the burgeoning demands for captioning-related applications have inspired great
endeavors in the remote sensing community. However, current benchmark datasets are …

Transform and tell: Entity-aware news image captioning

A Tran, A Mathews, L Xie - … of the IEEE/CVF conference on …, 2020 - openaccess.thecvf.com
We propose an end-to-end model which generates captions for images embedded in news
articles. News images present two key challenges: they rely on real-world knowledge …

Boosting entity-aware image captioning with multi-modal knowledge graph

W Zhao, X Wu - IEEE Transactions on Multimedia, 2023 - ieeexplore.ieee.org
Entity-aware image captioning aims to describe named entities and events related to the
image by utilizing the background knowledge in the associated article. This task remains …

Explain me the painting: Multi-topic knowledgeable art description generation

Z Bai, Y Nakashima, N Garcia - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
Have you ever looked at a painting and wondered what is the story behind it? This work
presents a framework to bring art closer to people by generating comprehensive …

Multilayer dense attention model for image caption

EK Wang, X Zhang, F Wang, TY Wu, CM Chen - IEEE Access, 2019 - ieeexplore.ieee.org
The image caption is a technology that enables us to understand the contents and generate
descriptive text, of images using machines. With the development of deep learning, means …