Random shuffling beats SGD after finite epochs
A long-standing problem in stochastic optimization is proving that\rsgd, the without-
replacement version of\sgd, converges faster than the usual with-replacement\sgd. Building …
replacement version of\sgd, converges faster than the usual with-replacement\sgd. Building …
Cyclic block coordinate descent with variance reduction for composite nonconvex optimization
Nonconvex optimization is central in solving many machine learning problems, in which
block-wise structure is commonly encountered. In this work, we propose cyclic block …
block-wise structure is commonly encountered. In this work, we propose cyclic block …
Closing the convergence gap of SGD without replacement
Stochastic gradient descent without replacement sampling is widely used in practice for
model training. However, the vast majority of SGD analyses assumes data is sampled with …
model training. However, the vast majority of SGD analyses assumes data is sampled with …
Let's Make Block Coordinate Descent Converge Faster: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence
Block coordinate descent (BCD) methods are widely used for large-scale numerical
optimization because of their cheap iteration costs, low memory requirements, amenability to …
optimization because of their cheap iteration costs, low memory requirements, amenability to …
[图书][B] Lyapunov arguments in optimization
A Wilson - 2018 - search.proquest.com
Optimization is among the richest modeling languages in science. In statistics and machine
learning, for instance, inference is typically posed as an optimization problem. While there …
learning, for instance, inference is typically posed as an optimization problem. While there …
Worst-case complexity of cyclic coordinate descent: gap with randomized version
This paper concerns the worst-case complexity of cyclic coordinate descent (C-CD) for
minimizing a convex quadratic function, which is equivalent to Gauss–Seidel method …
minimizing a convex quadratic function, which is equivalent to Gauss–Seidel method …
Global stability of first-order methods for coercive tame functions
We consider first-order methods with constant step size for minimizing locally Lipschitz
coercive functions that are tame in an o-minimal structure on the real field. We prove that if …
coercive functions that are tame in an o-minimal structure on the real field. We prove that if …
Accelerated cyclic coordinate dual averaging with extrapolation for composite convex optimization
Exploiting partial first-order information in a cyclic way is arguably the most natural strategy
to obtain scalable first-order methods. However, despite their wide use in practice, cyclic …
to obtain scalable first-order methods. However, despite their wide use in practice, cyclic …
Convergence rate of block-coordinate maximization Burer–Monteiro method for solving large SDPs
Semidefinite programming (SDP) with diagonal constraints arise in many optimization
problems, such as Max-Cut, community detection and group synchronization. Although …
problems, such as Max-Cut, community detection and group synchronization. Although …