Llm merging: Building llms efficiently through merging

D Tam, M Li, P Yadav, RB Gabrielsson… - NeurIPS 2024 …, 2024 - openreview.net
Training high-performing large language models (LLMs) from scratch is a notoriously
expensive and difficult task, costing hundreds of millions of dollars in compute alone. These …

PLeaS--Merging Models with Permutations and Least Squares

A Nasery, J Hayase, PW Koh, S Oh - arXiv preprint arXiv:2407.02447, 2024 - arxiv.org
The democratization of machine learning systems has made the process of fine-tuning
accessible to a large number of practitioners, leading to a wide range of open-source …

Sok: On finding common ground in loss landscapes using deep model merging techniques

A Khan, T Nief, N Hudson, M Sakarvadia… - arXiv preprint arXiv …, 2024 - arxiv.org
Understanding neural networks is crucial to creating reliable and trustworthy deep learning
models. Most contemporary research in interpretability analyzes just one model at a time via …

Layer-wise Model Merging for Unsupervised Domain Adaptation in Segmentation Tasks

R Alcover-Couso, JC SanMiguel… - arXiv preprint arXiv …, 2024 - arxiv.org
Merging parameters of multiple models has resurfaced as an effective strategy to enhance
task performance and robustness, but prior work is limited by the high costs of ensemble …

Rethink Model Re-Basin and the Linear Mode Connectivity

X Qu, S Horvath - arXiv preprint arXiv:2402.05966, 2024 - arxiv.org
Recent studies suggest that with sufficiently wide models, most SGD solutions can, up to
permutation, converge into the same basin. This phenomenon, known as the model re-basin …