Variance-reduced methods for machine learning

RM Gower, M Schmidt, F Bach… - Proceedings of the …, 2020 - ieeexplore.ieee.org
Stochastic optimization lies at the heart of machine learning, and its cornerstone is
stochastic gradient descent (SGD), a method introduced over 60 years ago. The last eight …

Deep learning for load forecasting with smart meter data: Online Adaptive Recurrent Neural Network

MN Fekri, H Patel, K Grolinger, V Sharma - Applied Energy, 2021 - Elsevier
Electricity load forecasting has been attracting research and industry attention because of its
importance for energy management, infrastructure planning, and budgeting. In recent years …

Making ai forget you: Data deletion in machine learning

A Ginart, M Guan, G Valiant… - Advances in neural …, 2019 - proceedings.neurips.cc
Intense recent discussions have focused on how to provide individuals with control over
when their data can and cannot be used---the EU's Right To Be Forgotten regulation is an …

Train faster, generalize better: Stability of stochastic gradient descent

M Hardt, B Recht, Y Singer - International conference on …, 2016 - proceedings.mlr.press
We show that parametric models trained by a stochastic gradient method (SGM) with few
iterations have vanishing generalization error. We prove our results by arguing that SGM is …

Straggler-resilient federated learning: Leveraging the interplay between statistical accuracy and system heterogeneity

A Reisizadeh, I Tziotis, H Hassani… - IEEE Journal on …, 2022 - ieeexplore.ieee.org
Federated learning is a novel paradigm that involves learning from data samples distributed
across a large network of clients while the data remains local. It is, however, known that …

A linearly-convergent stochastic L-BFGS algorithm

P Moritz, R Nishihara, M Jordan - Artificial Intelligence and …, 2016 - proceedings.mlr.press
We propose a new stochastic L-BFGS algorithm and prove a linear convergence rate for
strongly convex and smooth functions. Our algorithm draws heavily from a recent stochastic …

On analog gradient descent learning over multiple access fading channels

T Sery, K Cohen - IEEE Transactions on Signal Processing, 2020 - ieeexplore.ieee.org
We consider a distributed learning problem over multiple access channel (MAC) using a
large wireless network. The computation is made by the network edge and is based on …

The step decay schedule: A near optimal, geometrically decaying learning rate procedure for least squares

R Ge, SM Kakade, R Kidambi… - Advances in neural …, 2019 - proceedings.neurips.cc
Minimax optimal convergence rates for numerous classes of stochastic convex optimization
problems are well characterized, where the majority of results utilize iterate averaged …

The heavy-tail phenomenon in SGD

M Gurbuzbalaban, U Simsekli… - … Conference on Machine …, 2021 - proceedings.mlr.press
In recent years, various notions of capacity and complexity have been proposed for
characterizing the generalization properties of stochastic gradient descent (SGD) in deep …

ProxSARAH: An efficient algorithmic framework for stochastic composite nonconvex optimization

NH Pham, LM Nguyen, DT Phan… - Journal of Machine …, 2020 - jmlr.org
We propose a new stochastic first-order algorithmic framework to solve stochastic composite
nonconvex optimization problems that covers both finite-sum and expectation settings. Our …