A theory of non-linear feature learning with one gradient step in two-layer neural networks

B Moniri, D Lee, H Hassani, E Dobriban - arXiv preprint arXiv:2310.07891, 2023 - arxiv.org
Feature learning is thought to be one of the fundamental reasons for the success of deep
neural networks. It is rigorously known that in two-layer fully-connected neural networks …

Provable multi-task representation learning by two-layer relu neural networks

L Collins, H Hassani, M Soltanolkotabi… - arXiv preprint arXiv …, 2023 - arxiv.org
Feature learning, ie extracting meaningful representations of data, is quintessential to the
practical success of neural networks trained with gradient descent, yet it is notoriously …

Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations

K Oko, Y Song, T Suzuki, D Wu - arXiv preprint arXiv:2406.11828, 2024 - arxiv.org
We study the computational and sample complexity of learning a target function $
f_*:\mathbb {R}^ d\to\mathbb {R} $ with additive structure, that is, $ f_*(x)=\frac {1}{\sqrt …

How Does Gradient Descent Learn Features--A Local Analysis for Regularized Two-Layer Neural Networks

M Zhou, R Ge - arXiv preprint arXiv:2406.01766, 2024 - arxiv.org
The ability of learning useful features is one of the major advantages of neural networks.
Although recent works show that neural network can operate in a neural tangent kernel …

STATISTICAL AND HIGH-DIMENSIONAL PERSPECTIVES ON MACHINE LEARNING

D Lee - 2024 - repository.upenn.edu
In the first chapter, we consider the problem of calibration. While the accuracy of modern
machine learning techniques continues to improve, many models exhibit mis-calibration …