Exploiting bounded staleness to speed up big data analytics

J Verbraeken, M Wolting, J Katzy… - Acm computing surveys …, 2020 - dl.acm.org

The demand for artificial intelligence has grown significantly over the past decade, and this
growth has been fueled by advances in machine learning techniques and the ability to …

被引用次数：884 相关文章所有 9 个版本

[PDF] arxiv.org

Scalable deep learning on distributed infrastructures: Challenges, techniques, and tools

R Mayer, HA Jacobsen - ACM Computing Surveys (CSUR), 2020 - dl.acm.org

Deep Learning (DL) has had an immense success in the recent past, leading to state-of-the-
art results in various domains, such as image recognition and natural language processing …

被引用次数：247 相关文章所有 8 个版本

[PDF] nsf.gov

PipeDream: Generalized pipeline parallelism for DNN training

D Narayanan, A Harlap, A Phanishayee… - Proceedings of the 27th …, 2019 - dl.acm.org

DNN training is extremely time-consuming, necessitating efficient multi-accelerator
parallelization. Current approaches to parallelizing training primarily use intra-batch …

被引用次数：867 相关文章所有 17 个版本

[PDF] usenix.org

Dorylus: Affordable, scalable, and accurate {GNN} training with distributed {CPU} servers and serverless threads

J Thorpe, Y Qiao, J Eyolfson, S Teng, G Hu… - … USENIX Symposium on …, 2021 - usenix.org

A graph neural network (GNN) enables deep learning on structured graph data. There are
two major GNN training obstacles: 1) it relies on high-end servers with many GPUs which …

被引用次数：145 相关文章所有 18 个版本

[PDF] usenix.org

Gaia:{Geo-Distributed} machine learning approaching {LAN} speeds

K Hsieh, A Harlap, N Vijaykumar, D Konomis… - … USENIX Symposium on …, 2017 - usenix.org

Machine learning (ML) is widely used to derive useful information from large-scale data
(such as user activities, pictures, and videos) generated at increasingly rapid rates, all over …

被引用次数：528 相关文章所有 23 个版本

[PDF] jmlr.org

Cooperative SGD: A unified framework for the design and analysis of local-update SGD algorithms

J Wang, G Joshi - Journal of Machine Learning Research, 2021 - jmlr.org

When training machine learning models using stochastic gradient descent (SGD) with a
large number of nodes or massive edge devices, the communication cost of synchronizing …

被引用次数：293 相关文章所有 11 个版本

[PDF] mlsys.org

Adaptive communication strategies to achieve the best error-runtime trade-off in local-update SGD

J Wang, G Joshi - Proceedings of Machine Learning and …, 2019 - proceedings.mlsys.org

Large-scale machine learning training, in particular distributed stochastic gradient descent,
needs to be robust to inherent system variability such as node straggling and random …

被引用次数：261 相关文章所有 7 个版本

[PDF] acm.org

Geeps: Scalable deep learning on distributed gpus with a gpu-specialized parameter server

H Cui, H Zhang, GR Ganger, PB Gibbons… - Proceedings of the …, 2016 - dl.acm.org

Large-scale deep learning requires huge computational resources to train a multi-layer
neural network. Recent systems propose using 100s to 1000s of machines to train networks …

被引用次数：408 相关文章所有 15 个版本

[PDF] arxiv.org

Pipedream: Fast and efficient pipeline parallel dnn training

A Harlap, D Narayanan, A Phanishayee… - arXiv preprint arXiv …, 2018 - arxiv.org

PipeDream is a Deep Neural Network (DNN) training system for GPUs that parallelizes
computation by pipelining execution across multiple machines. Its pipeline parallel …

被引用次数：272 相关文章所有 10 个版本

[PDF] acm.org

Swapadvisor: Pushing deep learning beyond the gpu memory limit via smart swapping

CC Huang, G Jin, J Li - Proceedings of the Twenty-Fifth International …, 2020 - dl.acm.org

It is known that deeper and wider neural networks can achieve better accuracy. But it is
difficult to continue the trend to increase model size due to limited GPU memory. One …

被引用次数：182 相关文章所有 4 个版本