Random features for kernel approximation: A survey on algorithms, theory, and beyond
The class of random features is one of the most popular techniques to speed up kernel
methods in large-scale problems. Related works have been recognized by the NeurIPS Test …
methods in large-scale problems. Related works have been recognized by the NeurIPS Test …
Generalization properties of learning with random features
We study the generalization properties of ridge regression with random features in the
statistical learning framework. We show for the first time that $ O (1/\sqrt {n}) $ learning …
statistical learning framework. We show for the first time that $ O (1/\sqrt {n}) $ learning …
[PDF][PDF] Divide and conquer kernel ridge regression: A distributed algorithm with minimax optimal rates
We study a decomposition-based scalable approach to kernel ridge regression, and show
that it achieves minimax optimal convergence rates under relatively mild conditions. The …
that it achieves minimax optimal convergence rates under relatively mild conditions. The …
[PDF][PDF] Early stopping and non-parametric regression: an optimal data-dependent stopping rule
G Raskutti, MJ Wainwright, B Yu - The Journal of Machine Learning …, 2014 - jmlr.org
Early stopping is a form of regularization based on choosing when to stop running an
iterative algorithm. Focusing on non-parametric regression in a reproducing kernel Hilbert …
iterative algorithm. Focusing on non-parametric regression in a reproducing kernel Hilbert …
Distributed learning with regularized least squares
We study distributed learning with the least squares regularization scheme in a reproducing
kernel Hilbert space (RKHS). By a divide-and-conquer approach, the algorithm partitions a …
kernel Hilbert space (RKHS). By a divide-and-conquer approach, the algorithm partitions a …
Divide and conquer kernel ridge regression
We study a decomposition-based scalable approach to performing kernel ridge regression.
The method is simply described: it randomly partitions a dataset of size N into m subsets of …
The method is simply described: it randomly partitions a dataset of size N into m subsets of …
Sampling from Gaussian process posteriors using stochastic gradient descent
Gaussian processes are a powerful framework for quantifying uncertainty and for sequential
decision-making but are limited by the requirement of solving linear systems. In general, this …
decision-making but are limited by the requirement of solving linear systems. In general, this …
[HTML][HTML] Early stopping by correlating online indicators in neural networks
In order to minimize the generalization error in neural networks, a novel technique to identify
overfitting phenomena when training the learner is formally introduced. This enables support …
overfitting phenomena when training the learner is formally introduced. This enables support …
Nonparametric stochastic approximation with large step-sizes
A Dieuleveut, F Bach - 2016 - projecteuclid.org
We consider the random-design least-squares regression problem within the reproducing
kernel Hilbert space (RKHS) framework. Given a stream of independent and identically …
kernel Hilbert space (RKHS) framework. Given a stream of independent and identically …
Learning theory of distributed spectral algorithms
Spectral algorithms have been widely used and studied in learning theory and inverse
problems. This paper is concerned with distributed spectral algorithms, for handling big data …
problems. This paper is concerned with distributed spectral algorithms, for handling big data …