Communication-efficient distributed deep learning: A comprehensive survey
Distributed deep learning (DL) has become prevalent in recent years to reduce training time
by leveraging multiple computing devices (eg, GPUs/TPUs) due to larger models and …
by leveraging multiple computing devices (eg, GPUs/TPUs) due to larger models and …
Sparsified SGD with memory
SU Stich, JB Cordonnier… - Advances in neural …, 2018 - proceedings.neurips.cc
Huge scale machine learning problems are nowadays tackled by distributed optimization
algorithms, ie algorithms that leverage the compute power of many devices for training. The …
algorithms, ie algorithms that leverage the compute power of many devices for training. The …
CoCoA: A general framework for communication-efficient distributed optimization
The scale of modern datasets necessitates the development of efficient distributed
optimization methods for machine learning. We present a general-purpose framework for …
optimization methods for machine learning. We present a general-purpose framework for …
Fusionai: Decentralized training and deploying llms with massive consumer-level gpus
The rapid growth of memory and computation requirements of large language models
(LLMs) has outpaced the development of hardware, hindering people who lack large-scale …
(LLMs) has outpaced the development of hardware, hindering people who lack large-scale …
Efficient greedy coordinate descent for composite problems
SP Karimireddy, A Koloskova… - The 22nd …, 2019 - proceedings.mlr.press
Coordinate descent with random coordinate selection is the current state of the art for many
large scale optimization problems. However, greedy selection of the steepest coordinate on …
large scale optimization problems. However, greedy selection of the steepest coordinate on …
Snap ML: A hierarchical framework for machine learning
We describe a new software framework for fast training of generalized linear models. The
framework, named Snap Machine Learning (Snap ML), combines recent advances in …
framework, named Snap Machine Learning (Snap ML), combines recent advances in …
Coordinate descent with bandit sampling
Coordinate descent methods minimize a cost function by updating a single decision variable
(corresponding to one coordinate) at a time. Ideally, we would update the decision variable …
(corresponding to one coordinate) at a time. Ideally, we would update the decision variable …
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression
To alleviate hardware scarcity in training large deep neural networks (DNNs), particularly
large language models (LLMs), we present FusionLLM, a decentralized training system …
large language models (LLMs), we present FusionLLM, a decentralized training system …
Faster training by selecting samples using embeddings
S Gonzalez, J Landgraf… - 2019 International Joint …, 2019 - ieeexplore.ieee.org
Long training times have increasingly become a burden for researchers by slowing down
the pace of innovation, with some models taking days or weeks to train. In this paper, a new …
the pace of innovation, with some models taking days or weeks to train. In this paper, a new …
SySCD: A system-aware parallel coordinate descent algorithm
N Ioannou, C Mendler-Dünner… - Advances in Neural …, 2019 - proceedings.neurips.cc
In this paper we propose a novel parallel stochastic coordinate descent (SCD) algorithm
with convergence guarantees that exhibits strong scalability. We start by studying a state-of …
with convergence guarantees that exhibits strong scalability. We start by studying a state-of …