Random shuffling beats SGD after finite epochs

J Haochen, S Sra - International Conference on Machine …, 2019 - proceedings.mlr.press
A long-standing problem in stochastic optimization is proving that\rsgd, the without-
replacement version of\sgd, converges faster than the usual with-replacement\sgd. Building …

Cyclic block coordinate descent with variance reduction for composite nonconvex optimization

X Cai, C Song, S Wright… - … Conference on Machine …, 2023 - proceedings.mlr.press
Nonconvex optimization is central in solving many machine learning problems, in which
block-wise structure is commonly encountered. In this work, we propose cyclic block …

Closing the convergence gap of SGD without replacement

S Rajput, A Gupta… - … Conference on Machine …, 2020 - proceedings.mlr.press
Stochastic gradient descent without replacement sampling is widely used in practice for
model training. However, the vast majority of SGD analyses assumes data is sampled with …

[图书][B] Optimization for data analysis

SJ Wright, B Recht - 2022 - books.google.com
Optimization techniques are at the core of data science, including data analysis and
machine learning. An understanding of basic optimization techniques and their fundamental …

Let's Make Block Coordinate Descent Converge Faster: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence

J Nutini, I Laradji, M Schmidt - arXiv preprint arXiv:1712.08859, 2017 - arxiv.org
Block coordinate descent (BCD) methods are widely used for large-scale numerical
optimization because of their cheap iteration costs, low memory requirements, amenability to …

[图书][B] Lyapunov arguments in optimization

A Wilson - 2018 - search.proquest.com
Optimization is among the richest modeling languages in science. In statistics and machine
learning, for instance, inference is typically posed as an optimization problem. While there …

Worst-case complexity of cyclic coordinate descent: gap with randomized version

R Sun, Y Ye - Mathematical Programming, 2021 - Springer
This paper concerns the worst-case complexity of cyclic coordinate descent (C-CD) for
minimizing a convex quadratic function, which is equivalent to Gauss–Seidel method …

Global stability of first-order methods for coercive tame functions

C Josz, L Lai - Mathematical Programming, 2024 - Springer
We consider first-order methods with constant step size for minimizing locally Lipschitz
coercive functions that are tame in an o-minimal structure on the real field. We prove that if …

Accelerated cyclic coordinate dual averaging with extrapolation for composite convex optimization

CY Lin, C Song, J Diakonikolas - … Conference on Machine …, 2023 - proceedings.mlr.press
Exploiting partial first-order information in a cyclic way is arguably the most natural strategy
to obtain scalable first-order methods. However, despite their wide use in practice, cyclic …

Convergence rate of block-coordinate maximization Burer–Monteiro method for solving large SDPs

MA Erdogdu, A Ozdaglar, PA Parrilo… - Mathematical Programming, 2022 - Springer
Semidefinite programming (SDP) with diagonal constraints arise in many optimization
problems, such as Max-Cut, community detection and group synchronization. Although …