Learning a neuron by a shallow relu network: Dynamics and implicit bias for correlated inputs
We prove that, for the fundamental regression task of learning a single neuron, training a
one-hidden layer ReLU network of any width by gradient flow from a small initialisation …
one-hidden layer ReLU network of any width by gradient flow from a small initialisation …
Early alignment in two-layer networks training is a two-edged sword
E Boursier, N Flammarion - arXiv preprint arXiv:2401.10791, 2024 - arxiv.org
Training neural networks with first order optimisation methods is at the core of the empirical
success of deep learning. The scale of initialisation is a crucial factor, as small initialisations …
success of deep learning. The scale of initialisation is a crucial factor, as small initialisations …
Directional convergence near small initializations and saddles in two-homogeneous neural networks
A Kumar, J Haupt - arXiv preprint arXiv:2402.09226, 2024 - arxiv.org
This paper examines gradient flow dynamics of two-homogeneous neural networks for small
initializations, where all weights are initialized near the origin. For both square and logistic …
initializations, where all weights are initialized near the origin. For both square and logistic …
A Theory of Unimodal Bias in Multimodal Learning
Using multiple input streams simultaneously in training multimodal neural networks is
intuitively advantageous, but practically challenging. A key challenge is unimodal bias …
intuitively advantageous, but practically challenging. A key challenge is unimodal bias …
Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning
While the impressive performance of modern neural networks is often attributed to their
capacity to efficiently extract task-relevant features from data, the mechanisms underlying …
capacity to efficiently extract task-relevant features from data, the mechanisms underlying …
Can Implicit Bias Imply Adversarial Robustness?
The implicit bias of gradient-based training algorithms has been considered mostly
beneficial as it leads to trained networks that often generalize well. However, Frei et …
beneficial as it leads to trained networks that often generalize well. However, Frei et …
Simplicity bias and optimization threshold in two-layer ReLU networks
E Boursier, N Flammarion - arXiv preprint arXiv:2410.02348, 2024 - arxiv.org
Understanding generalization of overparametrized neural networks remains a fundamental
challenge in machine learning. Most of the literature mostly studies generalization from an …
challenge in machine learning. Most of the literature mostly studies generalization from an …
When Are Bias-Free ReLU Networks Like Linear Networks?
We investigate the expressivity and learning dynamics of bias-free ReLU networks. We firstly
show that two-layer bias-free ReLU networks have limited expressivity: the only odd function …
show that two-layer bias-free ReLU networks have limited expressivity: the only odd function …
ICL-TSVD: Bridging Theory and Practice in Continual Learning with Pre-trained Models
The goal of continual learning (CL) is to train a model that can solve multiple tasks
presented sequentially. Recent CL approaches have achieved strong performance by …
presented sequentially. Recent CL approaches have achieved strong performance by …
Analyzing Multi-Stage Loss Curve: Plateau and Descent Mechanisms in Neural Networks
ZA Chen, T Luo, GH Wang - arXiv preprint arXiv:2410.20119, 2024 - arxiv.org
The multi-stage phenomenon in the training loss curves of neural networks has been widely
observed, reflecting the non-linearity and complexity inherent in the training process. In this …
observed, reflecting the non-linearity and complexity inherent in the training process. In this …