Model merging by uncertainty-based gradient matching

T Akiba, M Shing, Y Tang, Q Sun, D Ha - arXiv preprint arXiv:2403.13187, 2024 - arxiv.org

We present a novel application of evolutionary algorithms to automate the creation of
powerful foundation models. While model merging has emerged as a promising approach …

被引用次数：25 相关文章所有 3 个版本

[PDF] arxiv.org

Towards modular llms by building and reusing a library of loras

O Ostapenko, Z Su, EM Ponti, L Charlin… - arXiv preprint arXiv …, 2024 - arxiv.org

The growing number of parameter-efficient adaptations of a base large language model
(LLM) calls for studying whether we can reuse such trained adapters to improve …

被引用次数：7 相关文章所有 3 个版本

[PDF] arxiv.org

Variational learning is effective for large deep networks

Y Shen, N Daheim, B Cong, P Nickl… - arXiv preprint arXiv …, 2024 - arxiv.org

We give extensive empirical evidence against the common belief that variational learning is
ineffective for large neural networks. We show that an optimizer called Improved Variational …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Model merging in llms, mllms, and beyond: Methods, theories, applications and opportunities

E Yang, L Shen, G Guo, X Wang, X Cao… - arXiv preprint arXiv …, 2024 - arxiv.org

Model merging is an efficient empowerment technique in the machine learning community
that does not require the collection of raw training data and does not require expensive …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

A Practitioner's Guide to Continual Multimodal Pretraining

K Roth, V Udandarao, S Dziadzio, A Prabhu… - arXiv preprint arXiv …, 2024 - arxiv.org

Multimodal foundation models serve numerous applications at the intersection of vision and
language. Still, despite being pretrained on extensive data, they become outdated over time …

[PDF] arxiv.org

FissionFusion: Fast Geometric Generation and Hierarchical Souping for Medical Image Analysis

S Sanjeev, N Zhaksylyk, I Almakky… - arXiv preprint arXiv …, 2024 - arxiv.org

The scarcity of well-annotated medical datasets requires leveraging transfer learning from
broader datasets like ImageNet or pre-trained models like CLIP. Model soups averages …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Bayesian Uncertainty for Gradient Aggregation in Multi-Task Learning

I Achituve, I Diamant, A Netzer, G Chechik… - arXiv preprint arXiv …, 2024 - arxiv.org

As machine learning becomes more prominent there is a growing demand to perform
several inference tasks in parallel. Running a dedicated model for each task is …

Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs

A Panda, B Isik, X Qi, S Koyejo, T Weissman… - arXiv preprint arXiv …, 2024 - arxiv.org

Existing methods for adapting large language models (LLMs) to new tasks are not suited to
multi-task adaptation because they modify all the model weights--causing destructive …

[PDF] arxiv.org

Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction

T Fu, D Cai, L Liu, S Shi, R Yan - arXiv preprint arXiv:2405.13432, 2024 - arxiv.org

Supervised fine-tuning (SFT) on instruction-following corpus is a crucial approach toward
the alignment of large language models (LLMs). However, the performance of LLMs on …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

You Only Merge Once: Learning the Pareto Set of Preference-Aware Model Merging

W Chen, J Kwok - arXiv preprint arXiv:2408.12105, 2024 - arxiv.org

Model merging, which combines multiple models into a single model, has gained increasing
popularity in recent years. By efficiently integrating the capabilities of various models without …