Early neuron alignment in two-layer relu networks with small initialization

D Chistikov, M Englert, R Lazic - Advances in Neural …, 2023 - proceedings.neurips.cc

We prove that, for the fundamental regression task of learning a single neuron, training a
one-hidden layer ReLU network of any width by gradient flow from a small initialisation …

被引用次数：10 相关文章所有 7 个版本

[PDF] arxiv.org

Early alignment in two-layer networks training is a two-edged sword

E Boursier, N Flammarion - arXiv preprint arXiv:2401.10791, 2024 - arxiv.org

Training neural networks with first order optimisation methods is at the core of the empirical
success of deep learning. The scale of initialisation is a crucial factor, as small initialisations …

被引用次数：8 相关文章所有 5 个版本

[PDF] arxiv.org

Directional convergence near small initializations and saddles in two-homogeneous neural networks

A Kumar, J Haupt - arXiv preprint arXiv:2402.09226, 2024 - arxiv.org

This paper examines gradient flow dynamics of two-homogeneous neural networks for small
initializations, where all weights are initialized near the origin. For both square and logistic …

被引用次数：5 相关文章所有 3 个版本

[PDF] arxiv.org

A Theory of Unimodal Bias in Multimodal Learning

Y Zhang, PE Latham, A Saxe - arXiv preprint arXiv:2312.00935, 2023 - arxiv.org

Using multiple input streams simultaneously in training multimodal neural networks is
intuitively advantageous, but practically challenging. A key challenge is unimodal bias …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning

D Kunin, A Raventós, C Dominé, F Chen… - arXiv preprint arXiv …, 2024 - arxiv.org

While the impressive performance of modern neural networks is often attributed to their
capacity to efficiently extract task-relevant features from data, the mechanisms underlying …

被引用次数：6 相关文章所有 3 个版本

[PDF] arxiv.org

Can Implicit Bias Imply Adversarial Robustness?

H Min, R Vidal - arXiv preprint arXiv:2405.15942, 2024 - arxiv.org

The implicit bias of gradient-based training algorithms has been considered mostly
beneficial as it leads to trained networks that often generalize well. However, Frei et …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Simplicity bias and optimization threshold in two-layer ReLU networks

E Boursier, N Flammarion - arXiv preprint arXiv:2410.02348, 2024 - arxiv.org

Understanding generalization of overparametrized neural networks remains a fundamental
challenge in machine learning. Most of the literature mostly studies generalization from an …

相关文章所有 2 个版本

[PDF] arxiv.org

When Are Bias-Free ReLU Networks Like Linear Networks?

Y Zhang, A Saxe, PE Latham - arXiv preprint arXiv:2406.12615, 2024 - arxiv.org

We investigate the expressivity and learning dynamics of bias-free ReLU networks. We firstly
show that two-layer bias-free ReLU networks have limited expressivity: the only odd function …

相关文章所有 2 个版本

[PDF] arxiv.org

ICL-TSVD: Bridging Theory and Practice in Continual Learning with Pre-trained Models

L Peng, J Elenter, J Agterberg, A Ribeiro… - arXiv preprint arXiv …, 2024 - arxiv.org

The goal of continual learning (CL) is to train a model that can solve multiple tasks
presented sequentially. Recent CL approaches have achieved strong performance by …

相关文章所有 2 个版本

[PDF] arxiv.org

Analyzing Multi-Stage Loss Curve: Plateau and Descent Mechanisms in Neural Networks

ZA Chen, T Luo, GH Wang - arXiv preprint arXiv:2410.20119, 2024 - arxiv.org

The multi-stage phenomenon in the training loss curves of neural networks has been widely
observed, reflecting the non-linearity and complexity inherent in the training process. In this …

相关文章所有 2 个版本