Lower bounds for non-convex stochastic optimization

Y Arjevani, Y Carmon, JC Duchi, DJ Foster… - Mathematical …, 2023 - Springer
We lower bound the complexity of finding ϵ-stationary points (with gradient norm at most ϵ)
using stochastic first-order methods. In a well-studied model where algorithms access …

Mime: Mimicking centralized stochastic algorithms in federated learning

SP Karimireddy, M Jaggi, S Kale, M Mohri… - arXiv preprint arXiv …, 2020 - arxiv.org
Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of
the data across different clients which gives rise to the client drift phenomenon. In fact …

Breaking the centralized barrier for cross-device federated learning

SP Karimireddy, M Jaggi, S Kale… - Advances in …, 2021 - proceedings.neurips.cc
Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of
the data across different clients which gives rise to the client drift phenomenon. In fact …

A group-theoretic framework for data augmentation

S Chen, E Dobriban, JH Lee - Journal of Machine Learning Research, 2020 - jmlr.org
Data augmentation is a widely used trick when training deep neural networks: in addition to
the original data, properly transformed data are also added to the training set. However, to …

Conditional gradient methods

G Braun, A Carderera, CW Combettes… - arXiv preprint arXiv …, 2022 - arxiv.org
The purpose of this survey is to serve both as a gentle introduction and a coherent overview
of state-of-the-art Frank--Wolfe algorithms, also called conditional gradient algorithms, for …

The complexity of nonconvex-strongly-concave minimax optimization

S Zhang, J Yang, C Guzmán… - Uncertainty in …, 2021 - proceedings.mlr.press
This paper studies the complexity for finding approximate stationary points of nonconvex-
strongly-concave (NC-SC) smooth minimax problems, in both general and averaged smooth …

Complexity of finding stationary points of nonconvex nonsmooth functions

J Zhang, H Lin, S Jegelka, S Sra… - … on Machine Learning, 2020 - proceedings.mlr.press
We provide the first non-asymptotic analysis for finding stationary points of nonsmooth,
nonconvex functions. In particular, we study the class of Hadamard semi-differentiable …

Optimal complexity in decentralized training

Y Lu, C De Sa - International conference on machine …, 2021 - proceedings.mlr.press
Decentralization is a promising method of scaling up parallel machine learning systems. In
this paper, we provide a tight lower bound on the iteration complexity for such methods in a …

The complexity of finding stationary points with stochastic gradient descent

Y Drori, O Shamir - International Conference on Machine …, 2020 - proceedings.mlr.press
We study the iteration complexity of stochastic gradient descent (SGD) for minimizing the
gradient norm of smooth, possibly nonconvex functions. We provide several results, implying …

Beyond uniform smoothness: A stopped analysis of adaptive sgd

M Faw, L Rout, C Caramanis… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
This work considers the problem of finding a first-order stationary point of a non-convex
function with potentially unbounded smoothness constant using a stochastic gradient oracle …