Random permutations fix a worst case for cyclic coordinate descent

J Haochen, S Sra - International Conference on Machine …, 2019 - proceedings.mlr.press

A long-standing problem in stochastic optimization is proving that\rsgd, the without-
replacement version of\sgd, converges faster than the usual with-replacement\sgd. Building …

被引用次数：126 相关文章所有 5 个版本

[PDF] mlr.press

Cyclic block coordinate descent with variance reduction for composite nonconvex optimization

X Cai, C Song, S Wright… - … Conference on Machine …, 2023 - proceedings.mlr.press

Nonconvex optimization is central in solving many machine learning problems, in which
block-wise structure is commonly encountered. In this work, we propose cyclic block …

被引用次数：16 相关文章所有 8 个版本

[PDF] mlr.press

Closing the convergence gap of SGD without replacement

S Rajput, A Gupta… - … Conference on Machine …, 2020 - proceedings.mlr.press

Stochastic gradient descent without replacement sampling is widely used in practice for
model training. However, the vast majority of SGD analyses assumes data is sampled with …

被引用次数：68 相关文章所有 7 个版本

[PDF] wisc.edu

[图书][B] Optimization for data analysis

SJ Wright, B Recht - 2022 - books.google.com

Optimization techniques are at the core of data science, including data analysis and
machine learning. An understanding of basic optimization techniques and their fundamental …

被引用次数：95 相关文章所有 5 个版本

[PDF] arxiv.org

Let's Make Block Coordinate Descent Converge Faster: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence

J Nutini, I Laradji, M Schmidt - arXiv preprint arXiv:1712.08859, 2017 - arxiv.org

Block coordinate descent (BCD) methods are widely used for large-scale numerical
optimization because of their cheap iteration costs, low memory requirements, amenability to …

被引用次数：61 相关文章所有 2 个版本

[PDF] escholarship.org

[图书][B] Lyapunov arguments in optimization

A Wilson - 2018 - search.proquest.com

Optimization is among the richest modeling languages in science. In statistics and machine
learning, for instance, inference is typically posed as an optimization problem. While there …

被引用次数：47 相关文章所有 2 个版本

[PDF] arxiv.org

Worst-case complexity of cyclic coordinate descent: gap with randomized version

R Sun, Y Ye - Mathematical Programming, 2021 - Springer

This paper concerns the worst-case complexity of cyclic coordinate descent (C-CD) for
minimizing a convex quadratic function, which is equivalent to Gauss–Seidel method …

被引用次数：67 相关文章所有 7 个版本

[PDF] arxiv.org

Global stability of first-order methods for coercive tame functions

C Josz, L Lai - Mathematical Programming, 2024 - Springer

We consider first-order methods with constant step size for minimizing locally Lipschitz
coercive functions that are tame in an o-minimal structure on the real field. We prove that if …

被引用次数：7 相关文章所有 3 个版本

[PDF] mlr.press

Accelerated cyclic coordinate dual averaging with extrapolation for composite convex optimization

CY Lin, C Song, J Diakonikolas - … Conference on Machine …, 2023 - proceedings.mlr.press

Exploiting partial first-order information in a cyclic way is arguably the most natural strategy
to obtain scalable first-order methods. However, despite their wide use in practice, cyclic …

被引用次数：5 相关文章所有 6 个版本

[PDF] arxiv.org

Convergence rate of block-coordinate maximization Burer–Monteiro method for solving large SDPs

MA Erdogdu, A Ozdaglar, PA Parrilo… - Mathematical Programming, 2022 - Springer

Semidefinite programming (SDP) with diagonal constraints arise in many optimization
problems, such as Max-Cut, community detection and group synchronization. Although …

被引用次数：40 相关文章所有 9 个版本