Knowledge graphs meet multi-modal learning: A comprehensive survey

Z Chen, Y Zhang, Y Fang, Y Geng, L Guo… - arXiv preprint arXiv …, 2024 - arxiv.org
Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the
semantic web community's exploration into multi-modal dimensions unlocking new avenues …

A survey on neural question generation: Methods, applications, and prospects

S Guo, L Liao, C Li, TS Chua - arXiv preprint arXiv:2402.18267, 2024 - arxiv.org
In this survey, we present a detailed examination of the advancements in Neural Question
Generation (NQG), a field leveraging neural network techniques to generate relevant …

AdaCoder: Adaptive Prompt Compression for Programmatic Visual Question Answering

M Ukai, S Kurita, A Hashimoto, Y Ushiku… - Proceedings of the 32nd …, 2024 - dl.acm.org
Visual question answering aims to provide responses to questions given visual input.
Recently, visual programmatic models (VPMs), which generate programs to answer …

Deconfounded Emotion Guidance Sticker Selection with Causal Inference

J Chen, Y Cai, R Xu, J Wang, J Xie, Q Li - Proceedings of the 32nd ACM …, 2024 - dl.acm.org
With the increasing popularity of online social applications, stickers have become common
in online chats. Teaching a model to select the appropriate sticker from a set of candidate …

Few-Shot Joint Multimodal Entity-Relation Extraction via Knowledge-Enhanced Cross-modal Prompt Model

L Yuan, Y Cai, J Huang - Proceedings of the 32nd ACM International …, 2024 - dl.acm.org
Joint Multimodal Entity-Relation Extraction (JMERE) is a challenging task that aims to extract
entities and their relations from textimage pairs in social media posts. Existing methods for …

Who Brings the Frisbee: Probing Hidden Hallucination Factors in Large Vision-Language Model via Causality Analysis

PH Huang, JL Li, CP Chen, MC Chang… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advancements in large vision-language models (LVLM) have significantly enhanced
their ability to comprehend visual inputs alongside natural language. However, a major …

Diverse Visual Question Generation Based on Multiple Objects Selection

W Fang, J Xie, H Liu, J Chen, Y Cai - ACM Transactions on Multimedia …, 2024 - dl.acm.org
Visual question generation task aims at generating high-quality questions about a given
image. To make this tak applicable to various scenarios, eg, the growing demand for exams …

DiagramQG: A Dataset for Generating Concept-Focused Questions from Diagrams

X Zhang, L Zhang, Y Wu, M Huang, W Wu, B Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Visual Question Generation (VQG) has gained significant attention due to its potential in
educational applications. However, VQG researches mainly focus on natural images …

Knowledge Graphs for Multi-Modal Learning: Survey and Perspective

Z Chen, Y Zhang, Y Fang, Y Geng, L Guo… - Available at SSRN … - papers.ssrn.com
Integrated with multi-modal learning, knowledge graphs (KGs) as structured knowledge
repositories, can enhance AI for processing and understanding complex, real-world data …