Random reshuffling: Simple analysis with vast improvements
K Mishchenko, A Khaled… - Advances in Neural …, 2020 - proceedings.neurips.cc
Random Reshuffling (RR) is an algorithm for minimizing finite-sum functions that utilizes
iterative gradient descent steps in conjunction with data reshuffling. Often contrasted with its …
iterative gradient descent steps in conjunction with data reshuffling. Often contrasted with its …
On the convergence of federated averaging with cyclic client participation
Abstract Federated Averaging (FedAvg) and its variants are the most popular optimization
algorithms in federated learning (FL). Previous convergence analyses of FedAvg either …
algorithms in federated learning (FL). Previous convergence analyses of FedAvg either …
Recent theoretical advances in non-convex optimization
Motivated by recent increased interest in optimization algorithms for non-convex
optimization in application to training deep neural networks and other optimization problems …
optimization in application to training deep neural networks and other optimization problems …
On the impact of machine learning randomness on group fairness
Statistical measures for group fairness in machine learning reflect the gap in performance of
algorithms across different groups. These measures, however, exhibit a high variance …
algorithms across different groups. These measures, however, exhibit a high variance …
Why globally re-shuffle? Revisiting data shuffling in large scale deep learning
Stochastic gradient descent (SGD) is the most prevalent algorithm for training Deep Neural
Networks (DNN). SGD iterates the input data set in each training epoch processing data …
Networks (DNN). SGD iterates the input data set in each training epoch processing data …
Sgd with shuffling: optimal rates without component convexity and large epoch requirements
We study without-replacement SGD for solving finite-sum optimization problems.
Specifically, depending on how the indices of the finite-sum are shuffled, we consider the …
Specifically, depending on how the indices of the finite-sum are shuffled, we consider the …
Convergence of random reshuffling under the kurdyka–łojasiewicz inequality
We study the random reshuffling () method for smooth nonconvex optimization problems
with a finite-sum structure. Though this method is widely utilized in practice, eg, in the …
with a finite-sum structure. Though this method is widely utilized in practice, eg, in the …
Minibatch vs local SGD with shuffling: Tight convergence bounds and beyond
In distributed learning, local SGD (also known as federated averaging) and its simple
baseline minibatch SGD are widely studied optimization methods. Most existing analyses of …
baseline minibatch SGD are widely studied optimization methods. Most existing analyses of …
AsGrad: A sharp unified analysis of asynchronous-SGD algorithms
R Islamov, M Safaryan… - … Conference on Artificial …, 2024 - proceedings.mlr.press
We analyze asynchronous-type algorithms for distributed SGD in the heterogeneous setting,
where each worker has its own computation and communication speeds, as well as data …
where each worker has its own computation and communication speeds, as well as data …
Fast distributionally robust learning with variance-reduced min-max optimization
Distributionally robust supervised learning (DRSL) is emerging as a key paradigm for
building reliable machine learning systems for real-world applications–reflecting the need …
building reliable machine learning systems for real-world applications–reflecting the need …