A comprehensive survey on coded distributed computing: Fundamentals, challenges, and networking applications
Distributed computing has become a common approach for large-scale computation tasks
due to benefits such as high reliability, scalability, computation speed, and cost …
due to benefits such as high reliability, scalability, computation speed, and cost …
Private retrieval, computing, and learning: Recent progress and future challenges
Most of our lives are conducted in the cyberspace. The human notion of privacy translates
into a cyber notion of privacy on many functions that take place in the cyberspace. This …
into a cyber notion of privacy on many functions that take place in the cyberspace. This …
Fedpaq: A communication-efficient federated learning method with periodic averaging and quantization
Federated learning is a distributed framework according to which a model is trained over a
set of devices, while keeping data localized. This framework faces several systems-oriented …
set of devices, while keeping data localized. This framework faces several systems-oriented …
Lagrange coded computing: Optimal design for resiliency, security, and privacy
We consider a scenario involving computations over a massive dataset stored distributedly
across multiple workers, which is at the core of distributed learning algorithms. We propose …
across multiple workers, which is at the core of distributed learning algorithms. We propose …
Gradient coding: Avoiding stragglers in distributed learning
We propose a novel coding theoretic framework for mitigating stragglers in distributed
learning. We show how carefully replicating data blocks and coding across gradients can …
learning. We show how carefully replicating data blocks and coding across gradients can …
Speeding up distributed machine learning using codes
K Lee, M Lam, R Pedarsani… - IEEE Transactions …, 2017 - ieeexplore.ieee.org
Codes are widely used in many engineering applications to offer robustness against noise.
In large-scale systems, there are several types of noise that can affect the performance of …
In large-scale systems, there are several types of noise that can affect the performance of …
Straggler mitigation in distributed matrix multiplication: Fundamental limits and optimal coding
Q Yu, MA Maddah-Ali… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
We consider the problem of massive matrix multiplication, which underlies many data
analytic applications, in a large-scale distributed system comprising a group of worker …
analytic applications, in a large-scale distributed system comprising a group of worker …
A fundamental tradeoff between computation and communication in distributed computing
How can we optimally trade extra computing power to reduce the communication load in
distributed computing? We answer this question by characterizing a fundamental tradeoff …
distributed computing? We answer this question by characterizing a fundamental tradeoff …
On the optimal recovery threshold of coded matrix multiplication
S Dutta, M Fahim, F Haddadpour… - IEEE Transactions …, 2019 - ieeexplore.ieee.org
We provide novel coded computation strategies for distributed matrix-matrix products that
outperform the recent “Polynomial code” constructions in recovery threshold, ie, the required …
outperform the recent “Polynomial code” constructions in recovery threshold, ie, the required …
Draco: Byzantine-resilient distributed training via redundant gradients
Distributed model training is vulnerable to byzantine system failures and adversarial
compute nodes, ie, nodes that use malicious updates to corrupt the global model stored at a …
compute nodes, ie, nodes that use malicious updates to corrupt the global model stored at a …