Training two-layer RELU networks with gradient descent is inconsistent

EL Bolager, I Burak, C Datar, Q Sun… - Advances in Neural …, 2023 - proceedings.neurips.cc

We introduce a probability distribution, combined with an efficient sampling algorithm, for
weights and biases of fully-connected neural networks. In a supervised learning context, no …

被引用次数：21 相关文章所有 8 个版本

[PDF] neurips.cc

Mind the spikes: Benign overfitting of kernels and neural networks in fixed dimension

M Haas, D Holzmüller, U Luxburg… - Advances in Neural …, 2024 - proceedings.neurips.cc

The success of over-parameterized neural networks trained to near-zero training error has
caused great interest in the phenomenon of benign overfitting, where estimators are …

被引用次数：16 相关文章所有 6 个版本

[PDF] arxiv.org

Why shallow networks struggle with approximating and learning high frequency: A numerical study

S Zhang, H Zhao, Y Zhong, H Zhou - arXiv preprint arXiv:2306.17301, 2023 - arxiv.org

In this work, a comprehensive numerical study involving analysis and experiments shows
why a two-layer neural network has difficulties handling high frequencies in approximation …

被引用次数：3 相关文章所有 2 个版本

[PDF] springer.com

On the omnipresence of spurious local minima in certain neural network training problems

C Christof, J Kowalczyk - Constructive Approximation, 2023 - Springer

We study the loss landscape of training problems for deep artificial neural networks with a
one-dimensional real output whose activation functions contain an affine segment and …

被引用次数：8 相关文章所有 6 个版本

[PDF] mdpi.com

How to Train an Artificial Neural Network to Predict Higher Heating Values of Biofuel

A Matveeva, A Bychkov - Energies, 2022 - mdpi.com

Plant biomass is one of the most promising and easy-to-use sources of renewable energy.
Direct determination of higher heating values of fuel in an adiabatic calorimeter is too …

被引用次数：4 相关文章所有 6 个版本

[PDF] arxiv.org

When Are Bias-Free ReLU Networks Like Linear Networks?

Y Zhang, A Saxe, PE Latham - arXiv preprint arXiv:2406.12615, 2024 - arxiv.org

We investigate the expressivity and learning dynamics of bias-free ReLU networks. We firstly
show that two-layer bias-free ReLU networks have limited expressivity: the only odd function …

Generative Feature Training of Thin 2-Layer Networks

J Hertrich, S Neumayer - arXiv preprint arXiv:2411.06848, 2024 - arxiv.org

We consider the approximation of functions by 2-layer neural networks with a small number
of hidden weights based on the squared loss and small datasets. Due to the highly non …

Critical point-finding methods reveal gradient-flat regions of deep network losses

CG Frye, J Simon, NS Wadia, A Ligeralde… - Neural …, 2021 - direct.mit.edu

Despite the fact that the loss functions of deep neural networks are highly nonconvex,
gradient-based optimization algorithms converge to approximately the same performance …

被引用次数：9 相关文章所有 10 个版本

[PDF] uni-stuttgart.de

Regression from linear models to neural networks: double descent, active learning, and sampling

D Holzmüller - 2023 - elib.uni-stuttgart.de

Regression, that is, the approximation of functions from (noisy) data, is a ubiquitous task in
machine learning and beyond. In this thesis, we study regression in three different settings …

Persistent Neurons

Y Min - arXiv preprint arXiv:2007.01419, 2020 - arxiv.org

Neural networks (NN)-based learning algorithms are strongly affected by the choices of
initialization and data distribution. Different optimization strategies have been proposed for …