Demystifying parallel and distributed deep learning: An in-depth concurrency analysis

T Ben-Nun, T Hoefler - ACM Computing Surveys (CSUR), 2019 - dl.acm.org
Deep Neural Networks (DNNs) are becoming an important tool in modern computing
applications. Accelerating their training is a major challenge and techniques range from …

A survey of techniques for optimizing deep learning on GPUs

S Mittal, S Vaishay - Journal of Systems Architecture, 2019 - Elsevier
The rise of deep-learning (DL) has been fuelled by the improvements in accelerators. Due to
its unique features, the GPU continues to remain the most widely used accelerator for DL …

A linear speedup analysis of distributed deep learning with sparse and quantized communication

P Jiang, G Agrawal - Advances in Neural Information …, 2018 - proceedings.neurips.cc
The large communication overhead has imposed a bottleneck on the performance of
distributed Stochastic Gradient Descent (SGD) for training deep neural networks. Previous …

HPC cloud for scientific and business applications: taxonomy, vision, and research challenges

MAS Netto, RN Calheiros, ER Rodrigues… - ACM Computing …, 2018 - dl.acm.org
High performance computing (HPC) clouds are becoming an alternative to on-premise
clusters for executing scientific applications and business analytics services. Most research …

Communication-efficient distributed deep learning: A comprehensive survey

Z Tang, S Shi, W Wang, B Li, X Chu - arXiv preprint arXiv:2003.06307, 2020 - arxiv.org
Distributed deep learning (DL) has become prevalent in recent years to reduce training time
by leveraging multiple computing devices (eg, GPUs/TPUs) due to larger models and …

Deep learning application in plant stress imaging: a review

Z Gao, Z Luo, W Zhang, Z Lv, Y Xu - AgriEngineering, 2020 - mdpi.com
Plant stress is one of major issues that cause significant economic loss for growers. The
labor-intensive conventional methods for identifying the stressed plants constrain their …

SparCML: High-performance sparse communication for machine learning

C Renggli, S Ashkboos, M Aghagolzadeh… - Proceedings of the …, 2019 - dl.acm.org
Applying machine learning techniques to the quickly growing data in science and industry
requires highly-scalable algorithms. Large datasets are most commonly processed" data …

Advancements in accelerating deep neural network inference on aiot devices: A survey

L Cheng, Y Gu, Q Liu, L Yang, C Liu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
The amalgamation of artificial intelligence with Internet of Things (AIoT) devices have seen a
rapid surge in growth, largely due to the effective implementation of deep neural network …

The MVAPICH project: Transforming research into high-performance MPI library for HPC community

DK Panda, H Subramoni, CH Chu… - Journal of Computational …, 2021 - Elsevier
Abstract High-Performance Computing (HPC) research, from hardware and software to the
end applications, provides remarkable computing power to help scientists solve complex …

Performance modeling and evaluation of distributed deep learning frameworks on gpus

S Shi, Q Wang, X Chu - 2018 IEEE 16th Intl Conf on …, 2018 - ieeexplore.ieee.org
Deep learning frameworks have been widely deployed on GPU servers for deep learning
applications in both academia and industry. In training deep neural networks (DNNs), there …