Learning a neuron by a shallow relu network: Dynamics and implicit bias for correlated inputs

D Chistikov, M Englert, R Lazic - Advances in Neural …, 2023 - proceedings.neurips.cc
We prove that, for the fundamental regression task of learning a single neuron, training a
one-hidden layer ReLU network of any width by gradient flow from a small initialisation …

Early alignment in two-layer networks training is a two-edged sword

E Boursier, N Flammarion - arXiv preprint arXiv:2401.10791, 2024 - arxiv.org
Training neural networks with first order optimisation methods is at the core of the empirical
success of deep learning. The scale of initialisation is a crucial factor, as small initialisations …

Directional convergence near small initializations and saddles in two-homogeneous neural networks

A Kumar, J Haupt - arXiv preprint arXiv:2402.09226, 2024 - arxiv.org
This paper examines gradient flow dynamics of two-homogeneous neural networks for small
initializations, where all weights are initialized near the origin. For both square and logistic …

A Theory of Unimodal Bias in Multimodal Learning

Y Zhang, PE Latham, A Saxe - arXiv preprint arXiv:2312.00935, 2023 - arxiv.org
Using multiple input streams simultaneously in training multimodal neural networks is
intuitively advantageous, but practically challenging. A key challenge is unimodal bias …

Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning

D Kunin, A Raventós, C Dominé, F Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
While the impressive performance of modern neural networks is often attributed to their
capacity to efficiently extract task-relevant features from data, the mechanisms underlying …

Can Implicit Bias Imply Adversarial Robustness?

H Min, R Vidal - arXiv preprint arXiv:2405.15942, 2024 - arxiv.org
The implicit bias of gradient-based training algorithms has been considered mostly
beneficial as it leads to trained networks that often generalize well. However, Frei et …

Simplicity bias and optimization threshold in two-layer ReLU networks

E Boursier, N Flammarion - arXiv preprint arXiv:2410.02348, 2024 - arxiv.org
Understanding generalization of overparametrized neural networks remains a fundamental
challenge in machine learning. Most of the literature mostly studies generalization from an …

When Are Bias-Free ReLU Networks Like Linear Networks?

Y Zhang, A Saxe, PE Latham - arXiv preprint arXiv:2406.12615, 2024 - arxiv.org
We investigate the expressivity and learning dynamics of bias-free ReLU networks. We firstly
show that two-layer bias-free ReLU networks have limited expressivity: the only odd function …

ICL-TSVD: Bridging Theory and Practice in Continual Learning with Pre-trained Models

L Peng, J Elenter, J Agterberg, A Ribeiro… - arXiv preprint arXiv …, 2024 - arxiv.org
The goal of continual learning (CL) is to train a model that can solve multiple tasks
presented sequentially. Recent CL approaches have achieved strong performance by …

Analyzing Multi-Stage Loss Curve: Plateau and Descent Mechanisms in Neural Networks

ZA Chen, T Luo, GH Wang - arXiv preprint arXiv:2410.20119, 2024 - arxiv.org
The multi-stage phenomenon in the training loss curves of neural networks has been widely
observed, reflecting the non-linearity and complexity inherent in the training process. In this …