The role of permutation invariance in linear mode connectivity of neural networks

J Von Oswald, E Niklasson… - International …, 2023 - proceedings.mlr.press

At present, the mechanisms of in-context learning in Transformers are not well understood
and remain mostly an intuition. In this paper, we suggest that training Transformers on auto …

被引用次数：359 相关文章所有 9 个版本

[PDF] neurips.cc

Ties-merging: Resolving interference when merging models

P Yadav, D Tam, L Choshen… - Advances in Neural …, 2024 - proceedings.neurips.cc

Transfer learning–ie, further fine-tuning a pre-trained model on a downstream task–can
confer significant advantages, including improved downstream performance, faster …

被引用次数：136 相关文章所有 7 个版本

[PDF] neurips.cc

Rewarded soups: towards pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards

A Rame, G Couairon, C Dancette… - Advances in …, 2024 - proceedings.neurips.cc

Foundation models are first pre-trained on vast unsupervised datasets and then fine-tuned
on labeled data. Reinforcement learning, notably from human feedback (RLHF), can further …

被引用次数：79 相关文章所有 7 个版本

[PDF] neurips.cc

Patching open-vocabulary models by interpolating weights

G Ilharco, M Wortsman, SY Gadre… - Advances in …, 2022 - proceedings.neurips.cc

Open-vocabulary models like CLIP achieve high accuracy across many image classification
tasks. However, there are still settings where their zero-shot performance is far from optimal …

被引用次数：122 相关文章所有 6 个版本

[PDF] openreview.net

Branch-train-merge: Embarrassingly parallel training of expert language models

M Li, S Gururangan, T Dettmers, M Lewis… - arXiv preprint arXiv …, 2022 - arxiv.org

We present Branch-Train-Merge (BTM), a communication-efficient algorithm for
embarrassingly parallel training of large language models (LLMs). We show it is possible to …

被引用次数：117 相关文章所有 4 个版本

[PDF] mlr.press

Revisiting weighted aggregation in federated learning with neural networks

Z Li, T Lin, X Shang, C Wu - International Conference on …, 2023 - proceedings.mlr.press

In federated learning (FL), weighted aggregation of local models is conducted to generate a
global model, and the aggregation weights are normalized (the sum of weights is 1) and …

被引用次数：47 相关文章所有 7 个版本

[PDF] mlr.press

Model ratatouille: Recycling diverse models for out-of-distribution generalization

A Ramé, K Ahuja, J Zhang, M Cord… - International …, 2023 - proceedings.mlr.press

Foundation models are redefining how AI systems are built. Practitioners now follow a
standard procedure to build their machine learning solutions: from a pre-trained foundation …

被引用次数：51 相关文章所有 8 个版本

[PDF] neurips.cc

Permutation equivariant neural functionals

A Zhou, K Yang, K Burns, A Cardace… - Advances in neural …, 2024 - proceedings.neurips.cc

This work studies the design of neural networks that can process the weights or gradients of
other neural networks, which we refer to as neural functional networks (NFNs). Despite a …

被引用次数：42 相关文章所有 5 个版本

[PDF] mlr.press

Equivariant architectures for learning in deep weight spaces

A Navon, A Shamsian, I Achituve… - International …, 2023 - proceedings.mlr.press

Designing machine learning architectures for processing neural networks in their raw weight
matrix form is a newly introduced research direction. Unfortunately, the unique symmetry …

被引用次数：48 相关文章所有 6 个版本

[PDF] mlr.press

Mechanistic mode connectivity

ES Lubana, EJ Bigelow, RP Dick… - International …, 2023 - proceedings.mlr.press

We study neural network loss landscapes through the lens of mode connectivity, the
observation that minimizers of neural networks retrieved via training on a dataset are …

被引用次数：43 相关文章所有 9 个版本