Evolutionary optimization of model merging recipes

T Akiba, M Shing, Y Tang, Q Sun, D Ha - arXiv preprint arXiv:2403.13187, 2024 - arxiv.org
We present a novel application of evolutionary algorithms to automate the creation of
powerful foundation models. While model merging has emerged as a promising approach …

Towards modular llms by building and reusing a library of loras

O Ostapenko, Z Su, EM Ponti, L Charlin… - arXiv preprint arXiv …, 2024 - arxiv.org
The growing number of parameter-efficient adaptations of a base large language model
(LLM) calls for studying whether we can reuse such trained adapters to improve …

Variational learning is effective for large deep networks

Y Shen, N Daheim, B Cong, P Nickl… - arXiv preprint arXiv …, 2024 - arxiv.org
We give extensive empirical evidence against the common belief that variational learning is
ineffective for large neural networks. We show that an optimizer called Improved Variational …

Model merging in llms, mllms, and beyond: Methods, theories, applications and opportunities

E Yang, L Shen, G Guo, X Wang, X Cao… - arXiv preprint arXiv …, 2024 - arxiv.org
Model merging is an efficient empowerment technique in the machine learning community
that does not require the collection of raw training data and does not require expensive …

A Practitioner's Guide to Continual Multimodal Pretraining

K Roth, V Udandarao, S Dziadzio, A Prabhu… - arXiv preprint arXiv …, 2024 - arxiv.org
Multimodal foundation models serve numerous applications at the intersection of vision and
language. Still, despite being pretrained on extensive data, they become outdated over time …

FissionFusion: Fast Geometric Generation and Hierarchical Souping for Medical Image Analysis

S Sanjeev, N Zhaksylyk, I Almakky… - arXiv preprint arXiv …, 2024 - arxiv.org
The scarcity of well-annotated medical datasets requires leveraging transfer learning from
broader datasets like ImageNet or pre-trained models like CLIP. Model soups averages …

Bayesian Uncertainty for Gradient Aggregation in Multi-Task Learning

I Achituve, I Diamant, A Netzer, G Chechik… - arXiv preprint arXiv …, 2024 - arxiv.org
As machine learning becomes more prominent there is a growing demand to perform
several inference tasks in parallel. Running a dedicated model for each task is …

Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs

A Panda, B Isik, X Qi, S Koyejo, T Weissman… - arXiv preprint arXiv …, 2024 - arxiv.org
Existing methods for adapting large language models (LLMs) to new tasks are not suited to
multi-task adaptation because they modify all the model weights--causing destructive …

Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction

T Fu, D Cai, L Liu, S Shi, R Yan - arXiv preprint arXiv:2405.13432, 2024 - arxiv.org
Supervised fine-tuning (SFT) on instruction-following corpus is a crucial approach toward
the alignment of large language models (LLMs). However, the performance of LLMs on …

You Only Merge Once: Learning the Pareto Set of Preference-Aware Model Merging

W Chen, J Kwok - arXiv preprint arXiv:2408.12105, 2024 - arxiv.org
Model merging, which combines multiple models into a single model, has gained increasing
popularity in recent years. By efficiently integrating the capabilities of various models without …