Tighter lower bounds for shuffling SGD: Random permutations and beyond

J Cha, J Lee, C Yun - International Conference on Machine …, 2023 - proceedings.mlr.press
We study convergence lower bounds of without-replacement stochastic gradient descent
(SGD) for solving smooth (strongly-) convex finite-sum minimization problems. Unlike most …

AsGrad: A sharp unified analysis of asynchronous-SGD algorithms

R Islamov, M Safaryan… - … Conference on Artificial …, 2024 - proceedings.mlr.press
We analyze asynchronous-type algorithms for distributed SGD in the heterogeneous setting,
where each worker has its own computation and communication speeds, as well as data …

Grab: Finding provably better data permutations than random reshuffling

Y Lu, W Guo, CM De Sa - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Random reshuffling, which randomly permutes the dataset each epoch, is widely adopted in
model training because it yields faster convergence than with-replacement sampling …

Min-max multi-objective bilevel optimization with applications in robust machine learning

A Gu, S Lu, P Ram, TW Weng - The Eleventh International …, 2022 - openreview.net
We consider a generic min-max multi-objective bilevel optimization problem with
applications in robust machine learning such as representation learning and …

Characterizing & finding good data orderings for fast convergence of sequential gradient methods

A Mohtashami, S Stich, M Jaggi - arXiv preprint arXiv:2202.01838, 2022 - arxiv.org
While SGD, which samples from the data with replacement is widely studied in theory, a
variant called Random Reshuffling (RR) is more common in practice. RR iterates through …

Langevin Quasi-Monte Carlo

S Liu - Advances in Neural Information Processing Systems, 2024 - proceedings.neurips.cc
Abstract Langevin Monte Carlo (LMC) and its stochastic gradient versions are powerful
algorithms for sampling from complex high-dimensional distributions. To sample from a …

Mini-Batch Optimization of Contrastive Loss

J Cho, K Sreenivasan, K Lee, K Mun, S Yi… - arXiv preprint arXiv …, 2023 - arxiv.org
Contrastive learning has gained significant attention as a method for self-supervised
learning. The contrastive loss function ensures that embeddings of positive sample pairs …

[PDF][PDF] Coordinating distributed example orders for provably accelerated training

AF Cooper, W Guo, K Pham, T Yuan… - Thirty-seventh …, 2023 - proceedings.neurips.cc
Recent research on online Gradient Balancing (GraB) has revealed that there exist
permutation-based example orderings for SGD that are guaranteed to outperform random …

Provably Faster Algorithms for Bilevel Optimization via Without-Replacement Sampling

J Li, H Huang - arXiv preprint arXiv:2411.05868, 2024 - arxiv.org
Bilevel Optimization has experienced significant advancements recently with the
introduction of new efficient algorithms. Mirroring the success in single-level optimization …

CD-GraB: Coordinating Distributed Example Orders for Provably Accelerated Training

AF Cooper, W Guo, DK Pham, T Yuan… - Advances in …, 2024 - proceedings.neurips.cc
Recent research on online Gradient Balancing (GraB) has revealed that there exist
permutation-based example orderings that are guaranteed to outperform random reshuffling …