Variance-reduced methods for machine learning
Stochastic optimization lies at the heart of machine learning, and its cornerstone is
stochastic gradient descent (SGD), a method introduced over 60 years ago. The last eight …
stochastic gradient descent (SGD), a method introduced over 60 years ago. The last eight …
Deep learning for load forecasting with smart meter data: Online Adaptive Recurrent Neural Network
MN Fekri, H Patel, K Grolinger, V Sharma - Applied Energy, 2021 - Elsevier
Electricity load forecasting has been attracting research and industry attention because of its
importance for energy management, infrastructure planning, and budgeting. In recent years …
importance for energy management, infrastructure planning, and budgeting. In recent years …
Making ai forget you: Data deletion in machine learning
Intense recent discussions have focused on how to provide individuals with control over
when their data can and cannot be used---the EU's Right To Be Forgotten regulation is an …
when their data can and cannot be used---the EU's Right To Be Forgotten regulation is an …
Train faster, generalize better: Stability of stochastic gradient descent
We show that parametric models trained by a stochastic gradient method (SGM) with few
iterations have vanishing generalization error. We prove our results by arguing that SGM is …
iterations have vanishing generalization error. We prove our results by arguing that SGM is …
Straggler-resilient federated learning: Leveraging the interplay between statistical accuracy and system heterogeneity
Federated learning is a novel paradigm that involves learning from data samples distributed
across a large network of clients while the data remains local. It is, however, known that …
across a large network of clients while the data remains local. It is, however, known that …
A linearly-convergent stochastic L-BFGS algorithm
We propose a new stochastic L-BFGS algorithm and prove a linear convergence rate for
strongly convex and smooth functions. Our algorithm draws heavily from a recent stochastic …
strongly convex and smooth functions. Our algorithm draws heavily from a recent stochastic …
On analog gradient descent learning over multiple access fading channels
T Sery, K Cohen - IEEE Transactions on Signal Processing, 2020 - ieeexplore.ieee.org
We consider a distributed learning problem over multiple access channel (MAC) using a
large wireless network. The computation is made by the network edge and is based on …
large wireless network. The computation is made by the network edge and is based on …
The step decay schedule: A near optimal, geometrically decaying learning rate procedure for least squares
Minimax optimal convergence rates for numerous classes of stochastic convex optimization
problems are well characterized, where the majority of results utilize iterate averaged …
problems are well characterized, where the majority of results utilize iterate averaged …
The heavy-tail phenomenon in SGD
M Gurbuzbalaban, U Simsekli… - … Conference on Machine …, 2021 - proceedings.mlr.press
In recent years, various notions of capacity and complexity have been proposed for
characterizing the generalization properties of stochastic gradient descent (SGD) in deep …
characterizing the generalization properties of stochastic gradient descent (SGD) in deep …
ProxSARAH: An efficient algorithmic framework for stochastic composite nonconvex optimization
We propose a new stochastic first-order algorithmic framework to solve stochastic composite
nonconvex optimization problems that covers both finite-sum and expectation settings. Our …
nonconvex optimization problems that covers both finite-sum and expectation settings. Our …