The complexity of making the gradient small in stochastic convex optimization

Y Arjevani, Y Carmon, JC Duchi, DJ Foster… - Mathematical …, 2023 - Springer

We lower bound the complexity of finding ϵ-stationary points (with gradient norm at most ϵ)
using stochastic first-order methods. In a well-studied model where algorithms access …

被引用次数：322 相关文章所有 6 个版本

[PDF] arxiv.org

Mime: Mimicking centralized stochastic algorithms in federated learning

SP Karimireddy, M Jaggi, S Kale, M Mohri… - arXiv preprint arXiv …, 2020 - arxiv.org

Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of
the data across different clients which gives rise to the client drift phenomenon. In fact …

被引用次数：203 相关文章所有 3 个版本

[PDF] neurips.cc

Breaking the centralized barrier for cross-device federated learning

SP Karimireddy, M Jaggi, S Kale… - Advances in …, 2021 - proceedings.neurips.cc

Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of
the data across different clients which gives rise to the client drift phenomenon. In fact …

被引用次数：79 相关文章所有 6 个版本

[PDF] jmlr.org

A group-theoretic framework for data augmentation

S Chen, E Dobriban, JH Lee - Journal of Machine Learning Research, 2020 - jmlr.org

Data augmentation is a widely used trick when training deep neural networks: in addition to
the original data, properly transformed data are also added to the training set. However, to …

被引用次数：198 相关文章所有 12 个版本

[PDF] arxiv.org

Conditional gradient methods

G Braun, A Carderera, CW Combettes… - arXiv preprint arXiv …, 2022 - arxiv.org

The purpose of this survey is to serve both as a gentle introduction and a coherent overview
of state-of-the-art Frank--Wolfe algorithms, also called conditional gradient algorithms, for …

被引用次数：45 相关文章所有 2 个版本

[PDF] mlr.press

The complexity of nonconvex-strongly-concave minimax optimization

S Zhang, J Yang, C Guzmán… - Uncertainty in …, 2021 - proceedings.mlr.press

This paper studies the complexity for finding approximate stationary points of nonconvex-
strongly-concave (NC-SC) smooth minimax problems, in both general and averaged smooth …

被引用次数：68 相关文章所有 12 个版本

[PDF] mlr.press

Complexity of finding stationary points of nonconvex nonsmooth functions

J Zhang, H Lin, S Jegelka, S Sra… - … on Machine Learning, 2020 - proceedings.mlr.press

We provide the first non-asymptotic analysis for finding stationary points of nonsmooth,
nonconvex functions. In particular, we study the class of Hadamard semi-differentiable …

被引用次数：91 相关文章所有 6 个版本

[PDF] mlr.press

Optimal complexity in decentralized training

Y Lu, C De Sa - International conference on machine …, 2021 - proceedings.mlr.press

Decentralization is a promising method of scaling up parallel machine learning systems. In
this paper, we provide a tight lower bound on the iteration complexity for such methods in a …

被引用次数：78 相关文章所有 6 个版本

[PDF] mlr.press

The complexity of finding stationary points with stochastic gradient descent

Y Drori, O Shamir - International Conference on Machine …, 2020 - proceedings.mlr.press

We study the iteration complexity of stochastic gradient descent (SGD) for minimizing the
gradient norm of smooth, possibly nonconvex functions. We provide several results, implying …

被引用次数：85 相关文章所有 8 个版本

[PDF] mlr.press

Beyond uniform smoothness: A stopped analysis of adaptive sgd

M Faw, L Rout, C Caramanis… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

This work considers the problem of finding a first-order stationary point of a non-convex
function with potentially unbounded smoothness constant using a stochastic gradient oracle …

被引用次数：22 相关文章所有 4 个版本