A general analysis of example-selection for stochastic gradient descent

J Cha, J Lee, C Yun - International Conference on Machine …, 2023 - proceedings.mlr.press

We study convergence lower bounds of without-replacement stochastic gradient descent
(SGD) for solving smooth (strongly-) convex finite-sum minimization problems. Unlike most …

被引用次数：20 相关文章所有 8 个版本

[PDF] mlr.press

AsGrad: A sharp unified analysis of asynchronous-SGD algorithms

R Islamov, M Safaryan… - … Conference on Artificial …, 2024 - proceedings.mlr.press

We analyze asynchronous-type algorithms for distributed SGD in the heterogeneous setting,
where each worker has its own computation and communication speeds, as well as data …

被引用次数：9 相关文章所有 5 个版本

[PDF] neurips.cc

Grab: Finding provably better data permutations than random reshuffling

Y Lu, W Guo, CM De Sa - Advances in Neural Information …, 2022 - proceedings.neurips.cc

Random reshuffling, which randomly permutes the dataset each epoch, is widely adopted in
model training because it yields faster convergence than with-replacement sampling …

被引用次数：19 相关文章所有 5 个版本

[PDF] openreview.net

Min-max multi-objective bilevel optimization with applications in robust machine learning

A Gu, S Lu, P Ram, TW Weng - The Eleventh International …, 2022 - openreview.net

We consider a generic min-max multi-objective bilevel optimization problem with
applications in robust machine learning such as representation learning and …

被引用次数：10 相关文章所有 2 个版本

[PDF] arxiv.org

Characterizing & finding good data orderings for fast convergence of sequential gradient methods

A Mohtashami, S Stich, M Jaggi - arXiv preprint arXiv:2202.01838, 2022 - arxiv.org

While SGD, which samples from the data with replacement is widely studied in theory, a
variant called Random Reshuffling (RR) is more common in practice. RR iterates through …

被引用次数：13 相关文章所有 2 个版本

[PDF] neurips.cc

Langevin Quasi-Monte Carlo

S Liu - Advances in Neural Information Processing Systems, 2024 - proceedings.neurips.cc

Abstract Langevin Monte Carlo (LMC) and its stochastic gradient versions are powerful
algorithms for sampling from complex high-dimensional distributions. To sample from a …

被引用次数：4 相关文章所有 8 个版本

[PDF] arxiv.org

Mini-Batch Optimization of Contrastive Loss

J Cho, K Sreenivasan, K Lee, K Mun, S Yi… - arXiv preprint arXiv …, 2023 - arxiv.org

Contrastive learning has gained significant attention as a method for self-supervised
learning. The contrastive loss function ensures that embeddings of positive sample pairs …

被引用次数：3 相关文章所有 2 个版本

[PDF] neurips.cc

[PDF][PDF] Coordinating distributed example orders for provably accelerated training

AF Cooper, W Guo, K Pham, T Yuan… - Thirty-seventh …, 2023 - proceedings.neurips.cc

Recent research on online Gradient Balancing (GraB) has revealed that there exist
permutation-based example orderings for SGD that are guaranteed to outperform random …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

Provably Faster Algorithms for Bilevel Optimization via Without-Replacement Sampling

J Li, H Huang - arXiv preprint arXiv:2411.05868, 2024 - arxiv.org

Bilevel Optimization has experienced significant advancements recently with the
introduction of new efficient algorithms. Mirroring the success in single-level optimization …

被引用次数：1 相关文章所有 3 个版本

[PDF] neurips.cc

CD-GraB: Coordinating Distributed Example Orders for Provably Accelerated Training

AF Cooper, W Guo, DK Pham, T Yuan… - Advances in …, 2024 - proceedings.neurips.cc

Recent research on online Gradient Balancing (GraB) has revealed that there exist
permutation-based example orderings that are guaranteed to outperform random reshuffling …

被引用次数：1 相关文章所有 5 个版本