Tighter lower bounds for shuffling SGD: Random permutations and beyond
We study convergence lower bounds of without-replacement stochastic gradient descent
(SGD) for solving smooth (strongly-) convex finite-sum minimization problems. Unlike most …
(SGD) for solving smooth (strongly-) convex finite-sum minimization problems. Unlike most …
AsGrad: A sharp unified analysis of asynchronous-SGD algorithms
R Islamov, M Safaryan… - … Conference on Artificial …, 2024 - proceedings.mlr.press
We analyze asynchronous-type algorithms for distributed SGD in the heterogeneous setting,
where each worker has its own computation and communication speeds, as well as data …
where each worker has its own computation and communication speeds, as well as data …
Grab: Finding provably better data permutations than random reshuffling
Random reshuffling, which randomly permutes the dataset each epoch, is widely adopted in
model training because it yields faster convergence than with-replacement sampling …
model training because it yields faster convergence than with-replacement sampling …
Min-max multi-objective bilevel optimization with applications in robust machine learning
We consider a generic min-max multi-objective bilevel optimization problem with
applications in robust machine learning such as representation learning and …
applications in robust machine learning such as representation learning and …
Characterizing & finding good data orderings for fast convergence of sequential gradient methods
While SGD, which samples from the data with replacement is widely studied in theory, a
variant called Random Reshuffling (RR) is more common in practice. RR iterates through …
variant called Random Reshuffling (RR) is more common in practice. RR iterates through …
Langevin Quasi-Monte Carlo
S Liu - Advances in Neural Information Processing Systems, 2024 - proceedings.neurips.cc
Abstract Langevin Monte Carlo (LMC) and its stochastic gradient versions are powerful
algorithms for sampling from complex high-dimensional distributions. To sample from a …
algorithms for sampling from complex high-dimensional distributions. To sample from a …
Mini-Batch Optimization of Contrastive Loss
Contrastive learning has gained significant attention as a method for self-supervised
learning. The contrastive loss function ensures that embeddings of positive sample pairs …
learning. The contrastive loss function ensures that embeddings of positive sample pairs …
[PDF][PDF] Coordinating distributed example orders for provably accelerated training
Recent research on online Gradient Balancing (GraB) has revealed that there exist
permutation-based example orderings for SGD that are guaranteed to outperform random …
permutation-based example orderings for SGD that are guaranteed to outperform random …
Provably Faster Algorithms for Bilevel Optimization via Without-Replacement Sampling
Bilevel Optimization has experienced significant advancements recently with the
introduction of new efficient algorithms. Mirroring the success in single-level optimization …
introduction of new efficient algorithms. Mirroring the success in single-level optimization …
CD-GraB: Coordinating Distributed Example Orders for Provably Accelerated Training
Recent research on online Gradient Balancing (GraB) has revealed that there exist
permutation-based example orderings that are guaranteed to outperform random reshuffling …
permutation-based example orderings that are guaranteed to outperform random reshuffling …