Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation
M Belkin - Acta Numerica, 2021 - cambridge.org
In the past decade the mathematical theory of machine learning has lagged far behind the
triumphs of deep neural networks on practical challenges. However, the gap between theory …
triumphs of deep neural networks on practical challenges. However, the gap between theory …
The role of permutation invariance in linear mode connectivity of neural networks
In this paper, we conjecture that if the permutation invariance of neural networks is taken into
account, SGD solutions will likely have no barrier in the linear interpolation between them …
account, SGD solutions will likely have no barrier in the linear interpolation between them …
Decentralized federated averaging
Federated averaging (FedAvg) is a communication-efficient algorithm for distributed training
with an enormous number of clients. In FedAvg, clients keep their data locally for privacy …
with an enormous number of clients. In FedAvg, clients keep their data locally for privacy …
Bome! bilevel optimization made easy: A simple first-order approach
Bilevel optimization (BO) is useful for solving a variety of important machine learning
problems including but not limited to hyperparameter optimization, meta-learning, continual …
problems including but not limited to hyperparameter optimization, meta-learning, continual …
Learning linear causal representations from interventions under general nonlinear mixing
S Buchholz, G Rajendran… - Advances in …, 2024 - proceedings.neurips.cc
We study the problem of learning causal representations from unknown, latent interventions
in a general setting, where the latent distribution is Gaussian but the mixing function is …
in a general setting, where the latent distribution is Gaussian but the mixing function is …
Full stack optimization of transformer inference: a survey
Recent advances in state-of-the-art DNN architecture design have been moving toward
Transformer models. These models achieve superior accuracy across a wide range of …
Transformer models. These models achieve superior accuracy across a wide range of …
Gradient flow dynamics of shallow relu networks for square loss and orthogonal inputs
E Boursier, L Pillaud-Vivien… - Advances in Neural …, 2022 - proceedings.neurips.cc
The training of neural networks by gradient descent methods is a cornerstone of the deep
learning revolution. Yet, despite some recent progress, a complete theory explaining its …
learning revolution. Yet, despite some recent progress, a complete theory explaining its …
On penalty-based bilevel gradient descent method
Bilevel optimization enjoys a wide range of applications in hyper-parameter optimization,
meta-learning and reinforcement learning. However, bilevel problems are difficult to solve …
meta-learning and reinforcement learning. However, bilevel problems are difficult to solve …
Neural collapse with normalized features: A geometric analysis over the riemannian manifold
When training overparameterized deep networks for classification tasks, it has been widely
observed that the learned features exhibit a so-called" neural collapse'" phenomenon. More …
observed that the learned features exhibit a so-called" neural collapse'" phenomenon. More …
High-probability bounds for stochastic optimization and variational inequalities: the case of unbounded variance
During the recent years the interest of optimization and machine learning communities in
high-probability convergence of stochastic optimization methods has been growing. One of …
high-probability convergence of stochastic optimization methods has been growing. One of …