Lower bounds for non-convex stochastic optimization
We lower bound the complexity of finding ϵ-stationary points (with gradient norm at most ϵ)
using stochastic first-order methods. In a well-studied model where algorithms access …
using stochastic first-order methods. In a well-studied model where algorithms access …
Mime: Mimicking centralized stochastic algorithms in federated learning
Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of
the data across different clients which gives rise to the client drift phenomenon. In fact …
the data across different clients which gives rise to the client drift phenomenon. In fact …
Breaking the centralized barrier for cross-device federated learning
Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of
the data across different clients which gives rise to the client drift phenomenon. In fact …
the data across different clients which gives rise to the client drift phenomenon. In fact …
A group-theoretic framework for data augmentation
Data augmentation is a widely used trick when training deep neural networks: in addition to
the original data, properly transformed data are also added to the training set. However, to …
the original data, properly transformed data are also added to the training set. However, to …
Conditional gradient methods
G Braun, A Carderera, CW Combettes… - arXiv preprint arXiv …, 2022 - arxiv.org
The purpose of this survey is to serve both as a gentle introduction and a coherent overview
of state-of-the-art Frank--Wolfe algorithms, also called conditional gradient algorithms, for …
of state-of-the-art Frank--Wolfe algorithms, also called conditional gradient algorithms, for …
The complexity of nonconvex-strongly-concave minimax optimization
This paper studies the complexity for finding approximate stationary points of nonconvex-
strongly-concave (NC-SC) smooth minimax problems, in both general and averaged smooth …
strongly-concave (NC-SC) smooth minimax problems, in both general and averaged smooth …
Complexity of finding stationary points of nonconvex nonsmooth functions
We provide the first non-asymptotic analysis for finding stationary points of nonsmooth,
nonconvex functions. In particular, we study the class of Hadamard semi-differentiable …
nonconvex functions. In particular, we study the class of Hadamard semi-differentiable …
Optimal complexity in decentralized training
Decentralization is a promising method of scaling up parallel machine learning systems. In
this paper, we provide a tight lower bound on the iteration complexity for such methods in a …
this paper, we provide a tight lower bound on the iteration complexity for such methods in a …
The complexity of finding stationary points with stochastic gradient descent
We study the iteration complexity of stochastic gradient descent (SGD) for minimizing the
gradient norm of smooth, possibly nonconvex functions. We provide several results, implying …
gradient norm of smooth, possibly nonconvex functions. We provide several results, implying …
Beyond uniform smoothness: A stopped analysis of adaptive sgd
This work considers the problem of finding a first-order stationary point of a non-convex
function with potentially unbounded smoothness constant using a stochastic gradient oracle …
function with potentially unbounded smoothness constant using a stochastic gradient oracle …