Model merging in llms, mllms, and beyond: Methods, theories, applications and opportunities

E Yang, L Shen, G Guo, X Wang, X Cao… - arXiv preprint arXiv …, 2024 - arxiv.org
Model merging is an efficient empowerment technique in the machine learning community
that does not require the collection of raw training data and does not require expensive …

Badmerging: Backdoor attacks against model merging

J Zhang, J Chi, Z Li, K Cai, Y Zhang… - Proceedings of the 2024 on …, 2024 - dl.acm.org
Fine-tuning pre-trained models for downstream tasks has led to a proliferation of open-
sourced task-specific models. Recently, Model Merging (MM) has emerged as an effective …

Scalable ranked preference optimization for text-to-image generation

S Karthik, H Coskun, Z Akata, S Tulyakov, J Ren… - arXiv preprint arXiv …, 2024 - arxiv.org
Direct Preference Optimization (DPO) has emerged as a powerful approach to align text-to-
image (T2I) models with human feedback. Unfortunately, successful application of DPO to …

Generate Any Scene: Evaluating and Improving Text-to-Vision Generation with Scene Graph Programming

Z Gao, W Huang, J Zhang, A Kembhavi… - arXiv preprint arXiv …, 2024 - arxiv.org
DALL-E and Sora have gained attention by producing implausible images, such as"
astronauts riding a horse in space." Despite the proliferation of text-to-vision models that …

EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models

R Zhao, H Yuan, Y Wei, S Zhang, Y Gu, L Ran… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advancements in generation models have showcased remarkable capabilities in
generating fantastic content. However, most of them are trained on proprietary high-quality …

Camera Settings as Tokens: Modeling Photography on Latent Diffusion Models

IS Fang, YH Han, JC Chen - SIGGRAPH Asia 2024 Conference Papers, 2024 - dl.acm.org
Text-to-image models have revolutionized content creation, enabling users to generate
images from natural language prompts. While recent advancements in conditioning these …

DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation

Z Wang, J Li, H Lin, J Yoon, M Bansal - arXiv preprint arXiv:2411.16657, 2024 - arxiv.org
Storytelling video generation (SVG) has recently emerged as a task to create long, multi-
motion, multi-scene videos that consistently represent the story described in the input text …

VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement

D Lee, J Yoon, J Cho, M Bansal - arXiv preprint arXiv:2411.15115, 2024 - arxiv.org
Recent text-to-video (T2V) diffusion models have demonstrated impressive generation
capabilities across various domains. However, these models often generate videos that …