Communication-efficient distributed deep learning: A comprehensive survey

Z Tang, S Shi, W Wang, B Li, X Chu - arXiv preprint arXiv:2003.06307, 2020 - arxiv.org
Distributed deep learning (DL) has become prevalent in recent years to reduce training time
by leveraging multiple computing devices (eg, GPUs/TPUs) due to larger models and …

Sparsified SGD with memory

SU Stich, JB Cordonnier… - Advances in neural …, 2018 - proceedings.neurips.cc
Huge scale machine learning problems are nowadays tackled by distributed optimization
algorithms, ie algorithms that leverage the compute power of many devices for training. The …

CoCoA: A general framework for communication-efficient distributed optimization

V Smith, S Forte, C Ma, M Takáč, MI Jordan… - Journal of Machine …, 2018 - jmlr.org
The scale of modern datasets necessitates the development of efficient distributed
optimization methods for machine learning. We present a general-purpose framework for …

Fusionai: Decentralized training and deploying llms with massive consumer-level gpus

Z Tang, Y Wang, X He, L Zhang, X Pan, Q Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
The rapid growth of memory and computation requirements of large language models
(LLMs) has outpaced the development of hardware, hindering people who lack large-scale …

Efficient greedy coordinate descent for composite problems

SP Karimireddy, A Koloskova… - The 22nd …, 2019 - proceedings.mlr.press
Coordinate descent with random coordinate selection is the current state of the art for many
large scale optimization problems. However, greedy selection of the steepest coordinate on …

Snap ML: A hierarchical framework for machine learning

C Dünner, T Parnell, D Sarigiannis… - Advances in …, 2018 - proceedings.neurips.cc
We describe a new software framework for fast training of generalized linear models. The
framework, named Snap Machine Learning (Snap ML), combines recent advances in …

Coordinate descent with bandit sampling

F Salehi, P Thiran, E Celis - Advances in Neural …, 2018 - proceedings.neurips.cc
Coordinate descent methods minimize a cost function by updating a single decision variable
(corresponding to one coordinate) at a time. Ideally, we would update the decision variable …

FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression

Z Tang, X Kang, Y Yin, X Pan, Y Wang, X He… - arXiv preprint arXiv …, 2024 - arxiv.org
To alleviate hardware scarcity in training large deep neural networks (DNNs), particularly
large language models (LLMs), we present FusionLLM, a decentralized training system …

Faster training by selecting samples using embeddings

S Gonzalez, J Landgraf… - 2019 International Joint …, 2019 - ieeexplore.ieee.org
Long training times have increasingly become a burden for researchers by slowing down
the pace of innovation, with some models taking days or weeks to train. In this paper, a new …

SySCD: A system-aware parallel coordinate descent algorithm

N Ioannou, C Mendler-Dünner… - Advances in Neural …, 2019 - proceedings.neurips.cc
In this paper we propose a novel parallel stochastic coordinate descent (SCD) algorithm
with convergence guarantees that exhibits strong scalability. We start by studying a state-of …