Bridging different language models and generative vision models for text-to-image generation

S Liu, W Pu, C Xu, Z Huang, Q Li, H Wang, C Lin… - 2024 - researchsquare.com

Recent advancements in MLLM, such as those exemplified by developments like GPT-4o,
have positioned them as a significant focus within the research community. MLLMs leverage …

被引用次数：1 相关文章

[PDF] arxiv.org

LLMs Meet Multimodal Generation and Editing: A Survey

Y He, Z Liu, J Chen, Z Tian, H Liu, X Chi, R Liu… - arXiv preprint arXiv …, 2024 - arxiv.org

With the recent advancement in large language models (LLMs), there is a growing interest in
combining LLMs with multimodal learning. Previous surveys of multimodal large language …

被引用次数：9 相关文章所有 2 个版本

[PDF] arxiv.org

Improving Long-Text Alignment for Text-to-Image Diffusion Models

L Liu, C Du, T Pang, Z Wang, C Li, D Xu - arXiv preprint arXiv:2410.11817, 2024 - arxiv.org

The rapid advancement of text-to-image (T2I) diffusion models has enabled them to
generate unprecedented results from given texts. However, as text inputs become longer …

Natural Language Inference Improves Compositionality in Vision-Language Models

P Cascante-Bonilla, Y Hou, YT Cao… - arXiv preprint arXiv …, 2024 - arxiv.org

Compositional reasoning in Vision-Language Models (VLMs) remains challenging as these
models often struggle to relate objects, attributes, and spatial relationships. Recent methods …

Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models

B Ma, Z Zong, G Song, H Li, Y Liu - arXiv preprint arXiv:2406.11831, 2024 - arxiv.org

Large language models (LLMs) based on decoder-only transformers have demonstrated
superior text understanding capabilities compared to CLIP and T5-series models. However …

被引用次数：6 相关文章

[PDF] arxiv.org

MAPWise: Evaluating Vision-Language Models for Advanced Map Queries

S Mukhopadhyay, A Rajgaria, P Khatiwada… - arXiv preprint arXiv …, 2024 - arxiv.org

Vision-language models (VLMs) excel at tasks requiring joint understanding of visual and
linguistic information. A particularly promising yet under-explored application for these …