Evolutionary optimization of model merging recipes
We present a novel application of evolutionary algorithms to automate the creation of
powerful foundation models. While model merging has emerged as a promising approach …
powerful foundation models. While model merging has emerged as a promising approach …
Towards modular llms by building and reusing a library of loras
The growing number of parameter-efficient adaptations of a base large language model
(LLM) calls for studying whether we can reuse such trained adapters to improve …
(LLM) calls for studying whether we can reuse such trained adapters to improve …
Variational learning is effective for large deep networks
We give extensive empirical evidence against the common belief that variational learning is
ineffective for large neural networks. We show that an optimizer called Improved Variational …
ineffective for large neural networks. We show that an optimizer called Improved Variational …
Model merging in llms, mllms, and beyond: Methods, theories, applications and opportunities
Model merging is an efficient empowerment technique in the machine learning community
that does not require the collection of raw training data and does not require expensive …
that does not require the collection of raw training data and does not require expensive …
A Practitioner's Guide to Continual Multimodal Pretraining
Multimodal foundation models serve numerous applications at the intersection of vision and
language. Still, despite being pretrained on extensive data, they become outdated over time …
language. Still, despite being pretrained on extensive data, they become outdated over time …
FissionFusion: Fast Geometric Generation and Hierarchical Souping for Medical Image Analysis
The scarcity of well-annotated medical datasets requires leveraging transfer learning from
broader datasets like ImageNet or pre-trained models like CLIP. Model soups averages …
broader datasets like ImageNet or pre-trained models like CLIP. Model soups averages …
Bayesian Uncertainty for Gradient Aggregation in Multi-Task Learning
As machine learning becomes more prominent there is a growing demand to perform
several inference tasks in parallel. Running a dedicated model for each task is …
several inference tasks in parallel. Running a dedicated model for each task is …
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs
Existing methods for adapting large language models (LLMs) to new tasks are not suited to
multi-task adaptation because they modify all the model weights--causing destructive …
multi-task adaptation because they modify all the model weights--causing destructive …
Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction
Supervised fine-tuning (SFT) on instruction-following corpus is a crucial approach toward
the alignment of large language models (LLMs). However, the performance of LLMs on …
the alignment of large language models (LLMs). However, the performance of LLMs on …
You Only Merge Once: Learning the Pareto Set of Preference-Aware Model Merging
W Chen, J Kwok - arXiv preprint arXiv:2408.12105, 2024 - arxiv.org
Model merging, which combines multiple models into a single model, has gained increasing
popularity in recent years. By efficiently integrating the capabilities of various models without …
popularity in recent years. By efficiently integrating the capabilities of various models without …