Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters

T Bai, H Liang, B Wan, L Yang, B Li, Y Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

Human beings perceive the world through diverse senses such as sight, smell, hearing, and
touch. Similarly, multimodal large language models (MLLMs) enhance the capabilities of …

被引用次数：12 相关文章所有 2 个版本

[PDF] arxiv.org

The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective

Z Qin, D Chen, W Zhang, L Yao, Y Huang… - arXiv preprint arXiv …, 2024 - arxiv.org

The rapid development of large language models (LLMs) has been witnessed in recent
years. Based on the powerful LLMs, multi-modal LLMs (MLLMs) extend the modality from …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models

Z Liu, H Liang, W Xiong, Q Yu, C He, B Cui… - arXiv preprint arXiv …, 2024 - arxiv.org

Recently, with the rise of web images, managing and understanding large-scale image
datasets has become increasingly important. Vision Large Language Models (VLLMs) have …

MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark

M Zhou, H Liang, T Li, Z Wu, M Lin, L Sun… - arXiv preprint arXiv …, 2024 - arxiv.org

With the development of Multimodal Large Language Models (MLLMs), the evaluation of
multimodal models in the context of mathematical problems has become a valuable …

Tuning-Free Alignment of Diffusion Models with Direct Noise Optimization

Z Tang, J Peng, J Tang, M Hong, F Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

In this work, we focus on the alignment problem of diffusion models with a continuous
reward function, which represents specific objectives for downstream tasks, such as …