: Cycle-Consistent Multi-Model Merging
arXiv preprint arXiv:2405.17897, 2024•arxiv.org
In this paper, we present a novel data-free method for merging neural networks in weight
space. Differently from most existing works, our method optimizes for the permutations of
network neurons globally across all layers. This allows us to enforce cycle consistency of the
permutations when merging $ N\geq 3$ models, allowing circular compositions of
permutations to be computed without accumulating error along the path. We qualitatively
and quantitatively motivate the need for such a constraint, showing its benefits when …
space. Differently from most existing works, our method optimizes for the permutations of
network neurons globally across all layers. This allows us to enforce cycle consistency of the
permutations when merging $ N\geq 3$ models, allowing circular compositions of
permutations to be computed without accumulating error along the path. We qualitatively
and quantitatively motivate the need for such a constraint, showing its benefits when …
In this paper, we present a novel data-free method for merging neural networks in weight space. Differently from most existing works, our method optimizes for the permutations of network neurons globally across all layers. This allows us to enforce cycle consistency of the permutations when merging models, allowing circular compositions of permutations to be computed without accumulating error along the path. We qualitatively and quantitatively motivate the need for such a constraint, showing its benefits when merging sets of models in scenarios spanning varying architectures and datasets. We finally show that, when coupled with activation renormalization, our approach yields the best results in the task.
arxiv.org
以上显示的是最相近的搜索结果。 查看全部搜索结果