Knowledge graphs meet multi-modal learning: A comprehensive survey

Z Chen, Y Zhang, Y Fang, Y Geng, L Guo… - arXiv preprint arXiv …, 2024 - arxiv.org
Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the
semantic web community's exploration into multi-modal dimensions unlocking new avenues …

Exploring the potential of large language models (llms) in learning on graphs

Z Chen, H Mao, H Li, W Jin, H Wen, X Wei… - ACM SIGKDD …, 2024 - dl.acm.org
Learning on Graphs has attracted immense attention due to its wide real-world applications.
The most popular pipeline for learning on graphs with textual node attributes primarily relies …

Language models are free boosters for biomedical imaging tasks

Z Lai, J Wu, S Chen, Y Zhou, A Hovakimyan… - arXiv preprint arXiv …, 2024 - arxiv.org
In this study, we uncover the unexpected efficacy of residual-based large language models
(LLMs) as part of encoders for biomedical imaging tasks, a domain traditionally devoid of …

Situational Awareness Matters in 3D Vision Language Reasoning

Y Man, LY Gui, YX Wang - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Being able to carry out complicated vision language reasoning tasks in 3D space represents
a significant milestone in developing household robots and human-centered embodied AI …

Residual-based Language Models are Free Boosters for Biomedical Imaging Tasks

Z Lai, J Wu, S Chen, Y Zhou… - Proceedings of the …, 2024 - openaccess.thecvf.com
In this study we uncover the unexpected efficacy of residual-based large language models
(LLMs) as part of encoders for biomedical imaging tasks a domain traditionally devoid of …

Multivariate Time-Series Anomaly Detection based on Enhancing Graph Attention Networks with Topological Analysis

Z Liu, X Huang, J Zhang, Z Hao, L Sun… - Proceedings of the 33rd …, 2024 - dl.acm.org
Unsupervised anomaly detection in time series is essential in industrial applications, as it
significantly reduces the need for manual intervention. Multivariate time series pose a …

Lexicon3d: Probing visual foundation models for complex 3d scene understanding

Y Man, S Zheng, Z Bao, M Hebert, LY Gui… - arXiv preprint arXiv …, 2024 - arxiv.org
Complex 3D scene understanding has gained increasing attention, with scene encoding
strategies playing a crucial role in this success. However, the optimal scene encoding …

Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval

Y Jiang, W Zhang, X Zhang, X Wei, CW Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
In this paper, we investigate the feasibility of leveraging large language models (LLMs) for
integrating general knowledge and incorporating pseudo-events as priors for temporal …

Llm4gen: Leveraging semantic representation of llms for text-to-image generation

M Liu, Y Ma, X Zhang, Y Zhen, Z Zhao, Z Hu… - arXiv preprint arXiv …, 2024 - arxiv.org
Diffusion Models have exhibited substantial success in text-to-image generation. However,
they often encounter challenges when dealing with complex and dense prompts that involve …

Adapting LLaMA Decoder to Vision Transformer

J Wang, W Shao, M Chen, C Wu, Y Liu, K Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
This work examines whether decoder-only Transformers such as LLaMA, which were
originally designed for large language models (LLMs), can be adapted to the computer …