Demystifying parallel and distributed deep learning: An in-depth concurrency analysis

T Ben-Nun, T Hoefler - ACM Computing Surveys (CSUR), 2019 - dl.acm.org
Deep Neural Networks (DNNs) are becoming an important tool in modern computing
applications. Accelerating their training is a major challenge and techniques range from …

[HTML][HTML] Optimization and acceleration of convolutional neural networks: A survey

G Habib, S Qureshi - Journal of King Saud University-Computer and …, 2022 - Elsevier
Convolutional neural networks (CNN) is a specialized case of artificial neural networks
(ANN) and finds its application in computer vision and parallel distributed computing for …

Convolutional neural networks: A survey

M Krichen - Computers, 2023 - mdpi.com
Artificial intelligence (AI) has become a cornerstone of modern technology, revolutionizing
industries from healthcare to finance. Convolutional neural networks (CNNs) are a subset of …

Data movement is all you need: A case study on optimizing transformers

A Ivanov, N Dryden, T Ben-Nun, S Li… - … of Machine Learning …, 2021 - proceedings.mlsys.org
Transformers are one of the most important machine learning workloads today. Training one
is a very compute-intensive task, often taking days or weeks, and significant attention has …

Reducing communication in graph neural network training

A Tripathy, K Yelick, A Buluç - SC20: International Conference …, 2020 - ieeexplore.ieee.org
Graph Neural Networks (GNNs) are powerful and flexible neural networks that use the
naturally sparse connectivity information of the data. GNNs represent this connectivity as …

Orchestrating the development lifecycle of machine learning-based IoT applications: A taxonomy and survey

B Qian, J Su, Z Wen, DN Jha, Y Li, Y Guan… - ACM Computing …, 2020 - dl.acm.org
Machine Learning (ML) and Internet of Things (IoT) are complementary advances: ML
techniques unlock the potential of IoT with intelligence, and IoT applications increasingly …

Efficient combination of rematerialization and offloading for training dnns

O Beaumont, L Eyraud-Dubois… - Advances in Neural …, 2021 - proceedings.neurips.cc
Rematerialization and offloading are two well known strategies to save memory during the
training phase of deep neural networks, allowing data scientists to consider larger models …

Gems: Gpu-enabled memory-aware model-parallelism system for distributed dnn training

A Jain, AA Awan, AM Aljuhani… - … Conference for High …, 2020 - ieeexplore.ieee.org
Data-parallelism has become an established paradigm to train DNNs that fit inside GPU
memory on large-scale HPC systems. However, model-parallelism is required to train out-of …

Channel and filter parallelism for large-scale CNN training

N Dryden, N Maruyama, T Moon, T Benson… - Proceedings of the …, 2019 - dl.acm.org
Accelerating large-scale CNN training is needed to keep training times reasonable as
datasets grow larger and models become more complex. Existing frameworks primarily …

Sequential aggregation and rematerialization: Distributed full-batch training of graph neural networks on large graphs

H Mostafa - Proceedings of Machine Learning and Systems, 2022 - proceedings.mlsys.org
Abstract We present the Sequential Aggregation and Rematerialization (SAR) scheme for
distributed full-batch training of Graph Neural Networks (GNNs) on large graphs. Large …