Training two-layer RELU networks with gradient descent is inconsistent
D Holzmüller, I Steinwart - Journal of Machine Learning Research, 2022 - jmlr.org
We prove that two-layer (Leaky) ReLU networks initialized by eg the widely used method
proposed by He et al.(2015) and trained using gradient descent on a least-squares loss are …
proposed by He et al.(2015) and trained using gradient descent on a least-squares loss are …