A unified convergence analysis for shuffling-type gradient methods

K Mishchenko, A Khaled… - Advances in Neural …, 2020 - proceedings.neurips.cc

Random Reshuffling (RR) is an algorithm for minimizing finite-sum functions that utilizes
iterative gradient descent steps in conjunction with data reshuffling. Often contrasted with its …

被引用次数：130 相关文章所有 11 个版本

[PDF] mlr.press

On the convergence of federated averaging with cyclic client participation

YJ Cho, P Sharma, G Joshi, Z Xu… - International …, 2023 - proceedings.mlr.press

Abstract Federated Averaging (FedAvg) and its variants are the most popular optimization
algorithms in federated learning (FL). Previous convergence analyses of FedAvg either …

被引用次数：16 相关文章所有 10 个版本

[PDF] arxiv.org

Recent theoretical advances in non-convex optimization

M Danilova, P Dvurechensky, A Gasnikov… - … and Probability: With a …, 2022 - Springer

Motivated by recent increased interest in optimization algorithms for non-convex
optimization in application to training deep neural networks and other optimization problems …

被引用次数：69 相关文章所有 11 个版本

[PDF] acm.org

On the impact of machine learning randomness on group fairness

P Ganesh, H Chang, M Strobel, R Shokri - Proceedings of the 2023 ACM …, 2023 - dl.acm.org

Statistical measures for group fairness in machine learning reflect the gap in performance of
algorithms across different groups. These measures, however, exhibit a high variance …

被引用次数：18 相关文章所有 7 个版本

[PDF] hal.science

Why globally re-shuffle? Revisiting data shuffling in large scale deep learning

TT Nguyen, F Trahay, J Domke, A Drozd… - 2022 IEEE …, 2022 - ieeexplore.ieee.org

Stochastic gradient descent (SGD) is the most prevalent algorithm for training Deep Neural
Networks (DNN). SGD iterates the input data set in each training epoch processing data …

被引用次数：30 相关文章所有 14 个版本

[PDF] neurips.cc

Sgd with shuffling: optimal rates without component convexity and large epoch requirements

K Ahn, C Yun, S Sra - Advances in Neural Information …, 2020 - proceedings.neurips.cc

We study without-replacement SGD for solving finite-sum optimization problems.
Specifically, depending on how the indices of the finite-sum are shuffled, we consider the …

被引用次数：65 相关文章所有 6 个版本

[PDF] arxiv.org

Convergence of random reshuffling under the kurdyka–łojasiewicz inequality

X Li, A Milzarek, J Qiu - SIAM Journal on Optimization, 2023 - SIAM

We study the random reshuffling () method for smooth nonconvex optimization problems
with a finite-sum structure. Though this method is widely utilized in practice, eg, in the …

被引用次数：34 相关文章所有 3 个版本

[PDF] arxiv.org

Minibatch vs local SGD with shuffling: Tight convergence bounds and beyond

C Yun, S Rajput, S Sra - arXiv preprint arXiv:2110.10342, 2021 - arxiv.org

In distributed learning, local SGD (also known as federated averaging) and its simple
baseline minibatch SGD are widely studied optimization methods. Most existing analyses of …

被引用次数：36 相关文章所有 5 个版本

[PDF] mlr.press

AsGrad: A sharp unified analysis of asynchronous-SGD algorithms

R Islamov, M Safaryan… - … Conference on Artificial …, 2024 - proceedings.mlr.press

We analyze asynchronous-type algorithms for distributed SGD in the heterogeneous setting,
where each worker has its own computation and communication speeds, as well as data …

被引用次数：3 相关文章所有 5 个版本

[PDF] mlr.press

Fast distributionally robust learning with variance-reduced min-max optimization

Y Yu, T Lin, EV Mazumdar… - … Conference on Artificial …, 2022 - proceedings.mlr.press

Distributionally robust supervised learning (DRSL) is emerging as a key paradigm for
building reliable machine learning systems for real-world applications–reflecting the need …

被引用次数：26 相关文章所有 8 个版本