Random reshuffling: Simple analysis with vast improvements

K Mishchenko, A Khaled… - Advances in Neural …, 2020 - proceedings.neurips.cc
Random Reshuffling (RR) is an algorithm for minimizing finite-sum functions that utilizes
iterative gradient descent steps in conjunction with data reshuffling. Often contrasted with its …

On the convergence of federated averaging with cyclic client participation

YJ Cho, P Sharma, G Joshi, Z Xu… - International …, 2023 - proceedings.mlr.press
Abstract Federated Averaging (FedAvg) and its variants are the most popular optimization
algorithms in federated learning (FL). Previous convergence analyses of FedAvg either …

Recent theoretical advances in non-convex optimization

M Danilova, P Dvurechensky, A Gasnikov… - … and Probability: With a …, 2022 - Springer
Motivated by recent increased interest in optimization algorithms for non-convex
optimization in application to training deep neural networks and other optimization problems …

On the impact of machine learning randomness on group fairness

P Ganesh, H Chang, M Strobel, R Shokri - Proceedings of the 2023 ACM …, 2023 - dl.acm.org
Statistical measures for group fairness in machine learning reflect the gap in performance of
algorithms across different groups. These measures, however, exhibit a high variance …

Why globally re-shuffle? Revisiting data shuffling in large scale deep learning

TT Nguyen, F Trahay, J Domke, A Drozd… - 2022 IEEE …, 2022 - ieeexplore.ieee.org
Stochastic gradient descent (SGD) is the most prevalent algorithm for training Deep Neural
Networks (DNN). SGD iterates the input data set in each training epoch processing data …

Sgd with shuffling: optimal rates without component convexity and large epoch requirements

K Ahn, C Yun, S Sra - Advances in Neural Information …, 2020 - proceedings.neurips.cc
We study without-replacement SGD for solving finite-sum optimization problems.
Specifically, depending on how the indices of the finite-sum are shuffled, we consider the …

Convergence of random reshuffling under the kurdyka–łojasiewicz inequality

X Li, A Milzarek, J Qiu - SIAM Journal on Optimization, 2023 - SIAM
We study the random reshuffling () method for smooth nonconvex optimization problems
with a finite-sum structure. Though this method is widely utilized in practice, eg, in the …

Minibatch vs local SGD with shuffling: Tight convergence bounds and beyond

C Yun, S Rajput, S Sra - arXiv preprint arXiv:2110.10342, 2021 - arxiv.org
In distributed learning, local SGD (also known as federated averaging) and its simple
baseline minibatch SGD are widely studied optimization methods. Most existing analyses of …

AsGrad: A sharp unified analysis of asynchronous-SGD algorithms

R Islamov, M Safaryan… - … Conference on Artificial …, 2024 - proceedings.mlr.press
We analyze asynchronous-type algorithms for distributed SGD in the heterogeneous setting,
where each worker has its own computation and communication speeds, as well as data …

Fast distributionally robust learning with variance-reduced min-max optimization

Y Yu, T Lin, EV Mazumdar… - … Conference on Artificial …, 2022 - proceedings.mlr.press
Distributionally robust supervised learning (DRSL) is emerging as a key paradigm for
building reliable machine learning systems for real-world applications–reflecting the need …