Frozen transformers in language models are effective visual encoder layers

Z Chen, Y Zhang, Y Fang, Y Geng, L Guo… - arXiv preprint arXiv …, 2024 - arxiv.org

Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the
semantic web community's exploration into multi-modal dimensions unlocking new avenues …

被引用次数：25 相关文章所有 2 个版本

[PDF] arxiv.org

Exploring the potential of large language models (llms) in learning on graphs

Z Chen, H Mao, H Li, W Jin, H Wen, X Wei… - ACM SIGKDD …, 2024 - dl.acm.org

Learning on Graphs has attracted immense attention due to its wide real-world applications.
The most popular pipeline for learning on graphs with textual node attributes primarily relies …

被引用次数：218 相关文章所有 9 个版本

[PDF] arxiv.org

Language models are free boosters for biomedical imaging tasks

Z Lai, J Wu, S Chen, Y Zhou, A Hovakimyan… - arXiv preprint arXiv …, 2024 - arxiv.org

In this study, we uncover the unexpected efficacy of residual-based large language models
(LLMs) as part of encoders for biomedical imaging tasks, a domain traditionally devoid of …

被引用次数：15 相关文章所有 2 个版本

[PDF] thecvf.com

Situational Awareness Matters in 3D Vision Language Reasoning

Y Man, LY Gui, YX Wang - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

Being able to carry out complicated vision language reasoning tasks in 3D space represents
a significant milestone in developing household robots and human-centered embodied AI …

被引用次数：6 相关文章所有 4 个版本

[PDF] thecvf.com

Residual-based Language Models are Free Boosters for Biomedical Imaging Tasks

Z Lai, J Wu, S Chen, Y Zhou… - Proceedings of the …, 2024 - openaccess.thecvf.com

In this study we uncover the unexpected efficacy of residual-based large language models
(LLMs) as part of encoders for biomedical imaging tasks a domain traditionally devoid of …

被引用次数：9 相关文章

[PDF] arxiv.org

Multivariate Time-Series Anomaly Detection based on Enhancing Graph Attention Networks with Topological Analysis

Z Liu, X Huang, J Zhang, Z Hao, L Sun… - Proceedings of the 33rd …, 2024 - dl.acm.org

Unsupervised anomaly detection in time series is essential in industrial applications, as it
significantly reduces the need for manual intervention. Multivariate time series pose a …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Lexicon3d: Probing visual foundation models for complex 3d scene understanding

Y Man, S Zheng, Z Bao, M Hebert, LY Gui… - arXiv preprint arXiv …, 2024 - arxiv.org

Complex 3D scene understanding has gained increasing attention, with scene encoding
strategies playing a crucial role in this success. However, the optimal scene encoding …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval

Y Jiang, W Zhang, X Zhang, X Wei, CW Chen… - arXiv preprint arXiv …, 2024 - arxiv.org

In this paper, we investigate the feasibility of leveraging large language models (LLMs) for
integrating general knowledge and incorporating pseudo-events as priors for temporal …

被引用次数：1 相关文章所有 4 个版本

[PDF] arxiv.org

Llm4gen: Leveraging semantic representation of llms for text-to-image generation

M Liu, Y Ma, X Zhang, Y Zhen, Z Zhao, Z Hu… - arXiv preprint arXiv …, 2024 - arxiv.org

Diffusion Models have exhibited substantial success in text-to-image generation. However,
they often encounter challenges when dealing with complex and dense prompts that involve …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Adapting LLaMA Decoder to Vision Transformer

J Wang, W Shao, M Chen, C Wu, Y Liu, K Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

This work examines whether decoder-only Transformers such as LLaMA, which were
originally designed for large language models (LLMs), can be adapted to the computer …

被引用次数：1 相关文章所有 2 个版本