A Comprehensive Survey of Multimodal Large Language Models: Concept, Application and Safety
S Liu, W Pu, C Xu, Z Huang, Q Li, H Wang, C Lin… - 2024 - researchsquare.com
Recent advancements in MLLM, such as those exemplified by developments like GPT-4o,
have positioned them as a significant focus within the research community. MLLMs leverage …
have positioned them as a significant focus within the research community. MLLMs leverage …
LLMs Meet Multimodal Generation and Editing: A Survey
With the recent advancement in large language models (LLMs), there is a growing interest in
combining LLMs with multimodal learning. Previous surveys of multimodal large language …
combining LLMs with multimodal learning. Previous surveys of multimodal large language …
Improving Long-Text Alignment for Text-to-Image Diffusion Models
The rapid advancement of text-to-image (T2I) diffusion models has enabled them to
generate unprecedented results from given texts. However, as text inputs become longer …
generate unprecedented results from given texts. However, as text inputs become longer …
Natural Language Inference Improves Compositionality in Vision-Language Models
Compositional reasoning in Vision-Language Models (VLMs) remains challenging as these
models often struggle to relate objects, attributes, and spatial relationships. Recent methods …
models often struggle to relate objects, attributes, and spatial relationships. Recent methods …
Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models
Large language models (LLMs) based on decoder-only transformers have demonstrated
superior text understanding capabilities compared to CLIP and T5-series models. However …
superior text understanding capabilities compared to CLIP and T5-series models. However …
MAPWise: Evaluating Vision-Language Models for Advanced Map Queries
S Mukhopadhyay, A Rajgaria, P Khatiwada… - arXiv preprint arXiv …, 2024 - arxiv.org
Vision-language models (VLMs) excel at tasks requiring joint understanding of visual and
linguistic information. A particularly promising yet under-explored application for these …
linguistic information. A particularly promising yet under-explored application for these …