Knowledge graphs meet multi-modal learning: A comprehensive survey
Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the
semantic web community's exploration into multi-modal dimensions unlocking new avenues …
semantic web community's exploration into multi-modal dimensions unlocking new avenues …
Translation between molecules and natural language
We present $\textbf {MolT5} $$-$ a self-supervised learning framework for pretraining
models on a vast amount of unlabeled natural language text and molecule strings. $\textbf …
models on a vast amount of unlabeled natural language text and molecule strings. $\textbf …
Ei-clip: Entity-aware interventional contrastive learning for e-commerce cross-modal retrieval
Abstract recommendation, and marketing services. Extensive efforts have been made to
conquer the cross-modal retrieval problem in the general domain. When it comes to E …
conquer the cross-modal retrieval problem in the general domain. When it comes to E …
Visual news: Benchmark and challenges in news image captioning
We propose Visual News Captioner, an entity-aware model for the task of news image
captioning. We also introduce Visual News, a large-scale benchmark consisting of more …
captioning. We also introduce Visual News, a large-scale benchmark consisting of more …
Good news, everyone! context driven entity-aware captioning for news images
Current image captioning systems perform at a merely descriptive level, essentially
enumerating the objects in the scene and their relations. Humans, on the contrary, interpret …
enumerating the objects in the scene and their relations. Humans, on the contrary, interpret …
NWPU-captions dataset and MLCA-net for remote sensing image captioning
Q Cheng, H Huang, Y Xu, Y Zhou, H Li… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Recently, the burgeoning demands for captioning-related applications have inspired great
endeavors in the remote sensing community. However, current benchmark datasets are …
endeavors in the remote sensing community. However, current benchmark datasets are …
Transform and tell: Entity-aware news image captioning
We propose an end-to-end model which generates captions for images embedded in news
articles. News images present two key challenges: they rely on real-world knowledge …
articles. News images present two key challenges: they rely on real-world knowledge …
Boosting entity-aware image captioning with multi-modal knowledge graph
W Zhao, X Wu - IEEE Transactions on Multimedia, 2023 - ieeexplore.ieee.org
Entity-aware image captioning aims to describe named entities and events related to the
image by utilizing the background knowledge in the associated article. This task remains …
image by utilizing the background knowledge in the associated article. This task remains …
Explain me the painting: Multi-topic knowledgeable art description generation
Have you ever looked at a painting and wondered what is the story behind it? This work
presents a framework to bring art closer to people by generating comprehensive …
presents a framework to bring art closer to people by generating comprehensive …
Multilayer dense attention model for image caption
The image caption is a technology that enables us to understand the contents and generate
descriptive text, of images using machines. With the development of deep learning, means …
descriptive text, of images using machines. With the development of deep learning, means …