Byzantine machine learning: A primer
The problem of Byzantine resilience in distributed machine learning, aka Byzantine machine
learning, consists of designing distributed algorithms that can train an accurate model …
learning, consists of designing distributed algorithms that can train an accurate model …
EF21: A new, simpler, theoretically better, and practically faster error feedback
P Richtárik, I Sokolov… - Advances in Neural …, 2021 - proceedings.neurips.cc
Error feedback (EF), also known as error compensation, is an immensely popular
convergence stabilization mechanism in the context of distributed training of supervised …
convergence stabilization mechanism in the context of distributed training of supervised …
Communication compression techniques in distributed deep learning: A survey
Nowadays, the training data and neural network models are getting increasingly large. The
training time of deep learning will become unbearably long on a single machine. To reduce …
training time of deep learning will become unbearably long on a single machine. To reduce …
Fast federated learning in the presence of arbitrary device unavailability
Federated learning (FL) coordinates with numerous heterogeneous devices to
collaboratively train a shared model while preserving user privacy. Despite its multiple …
collaboratively train a shared model while preserving user privacy. Despite its multiple …
FedNL: Making Newton-type methods applicable to federated learning
Inspired by recent work of Islamov et al (2021), we propose a family of Federated Newton
Learn (FedNL) methods, which we believe is a marked step in the direction of making …
Learn (FedNL) methods, which we believe is a marked step in the direction of making …
SoteriaFL: A unified framework for private federated learning with communication compression
To enable large-scale machine learning in bandwidth-hungry environments such as
wireless networks, significant progress has been made recently in designing communication …
wireless networks, significant progress has been made recently in designing communication …
Stochastic gradient descent-ascent: Unified theory and new efficient methods
A Beznosikov, E Gorbunov… - International …, 2023 - proceedings.mlr.press
Abstract Stochastic Gradient Descent-Ascent (SGDA) is one of the most prominent
algorithms for solving min-max optimization and variational inequalities problems (VIP) …
algorithms for solving min-max optimization and variational inequalities problems (VIP) …
EF21-P and friends: Improved theoretical communication complexity for distributed optimization with bidirectional compression
K Gruntkowska, A Tyurin… - … Conference on Machine …, 2023 - proceedings.mlr.press
In this work we focus our attention on distributed optimization problems in the context where
the communication time between the server and the workers is non-negligible. We obtain …
the communication time between the server and the workers is non-negligible. We obtain …
Variance reduction is an antidote to byzantines: Better rates, weaker assumptions and communication compression as a cherry on the top
Byzantine-robustness has been gaining a lot of attention due to the growth of the interest in
collaborative and federated learning. However, many fruitful directions, such as the usage of …
collaborative and federated learning. However, many fruitful directions, such as the usage of …
Recent theoretical advances in non-convex optimization
Motivated by recent increased interest in optimization algorithms for non-convex
optimization in application to training deep neural networks and other optimization problems …
optimization in application to training deep neural networks and other optimization problems …