Byzantine machine learning: A primer

R Guerraoui, N Gupta, R Pinot - ACM Computing Surveys, 2024 - dl.acm.org
The problem of Byzantine resilience in distributed machine learning, aka Byzantine machine
learning, consists of designing distributed algorithms that can train an accurate model …

EF21: A new, simpler, theoretically better, and practically faster error feedback

P Richtárik, I Sokolov… - Advances in Neural …, 2021 - proceedings.neurips.cc
Error feedback (EF), also known as error compensation, is an immensely popular
convergence stabilization mechanism in the context of distributed training of supervised …

Communication compression techniques in distributed deep learning: A survey

Z Wang, M Wen, Y Xu, Y Zhou, JH Wang… - Journal of Systems …, 2023 - Elsevier
Nowadays, the training data and neural network models are getting increasingly large. The
training time of deep learning will become unbearably long on a single machine. To reduce …

Fast federated learning in the presence of arbitrary device unavailability

X Gu, K Huang, J Zhang… - Advances in Neural …, 2021 - proceedings.neurips.cc
Federated learning (FL) coordinates with numerous heterogeneous devices to
collaboratively train a shared model while preserving user privacy. Despite its multiple …

FedNL: Making Newton-type methods applicable to federated learning

M Safaryan, R Islamov, X Qian, P Richtárik - arXiv preprint arXiv …, 2021 - arxiv.org
Inspired by recent work of Islamov et al (2021), we propose a family of Federated Newton
Learn (FedNL) methods, which we believe is a marked step in the direction of making …

SoteriaFL: A unified framework for private federated learning with communication compression

Z Li, H Zhao, B Li, Y Chi - Advances in Neural Information …, 2022 - proceedings.neurips.cc
To enable large-scale machine learning in bandwidth-hungry environments such as
wireless networks, significant progress has been made recently in designing communication …

Stochastic gradient descent-ascent: Unified theory and new efficient methods

A Beznosikov, E Gorbunov… - International …, 2023 - proceedings.mlr.press
Abstract Stochastic Gradient Descent-Ascent (SGDA) is one of the most prominent
algorithms for solving min-max optimization and variational inequalities problems (VIP) …

EF21-P and friends: Improved theoretical communication complexity for distributed optimization with bidirectional compression

K Gruntkowska, A Tyurin… - … Conference on Machine …, 2023 - proceedings.mlr.press
In this work we focus our attention on distributed optimization problems in the context where
the communication time between the server and the workers is non-negligible. We obtain …

Variance reduction is an antidote to byzantines: Better rates, weaker assumptions and communication compression as a cherry on the top

E Gorbunov, S Horváth, P Richtárik, G Gidel - arXiv preprint arXiv …, 2022 - arxiv.org
Byzantine-robustness has been gaining a lot of attention due to the growth of the interest in
collaborative and federated learning. However, many fruitful directions, such as the usage of …

Recent theoretical advances in non-convex optimization

M Danilova, P Dvurechensky, A Gasnikov… - … and Probability: With a …, 2022 - Springer
Motivated by recent increased interest in optimization algorithms for non-convex
optimization in application to training deep neural networks and other optimization problems …