Demystifying parallel and distributed deep learning: An in-depth concurrency analysis
Deep Neural Networks (DNNs) are becoming an important tool in modern computing
applications. Accelerating their training is a major challenge and techniques range from …
applications. Accelerating their training is a major challenge and techniques range from …
[HTML][HTML] Optimization and acceleration of convolutional neural networks: A survey
Convolutional neural networks (CNN) is a specialized case of artificial neural networks
(ANN) and finds its application in computer vision and parallel distributed computing for …
(ANN) and finds its application in computer vision and parallel distributed computing for …
Convolutional neural networks: A survey
M Krichen - Computers, 2023 - mdpi.com
Artificial intelligence (AI) has become a cornerstone of modern technology, revolutionizing
industries from healthcare to finance. Convolutional neural networks (CNNs) are a subset of …
industries from healthcare to finance. Convolutional neural networks (CNNs) are a subset of …
Data movement is all you need: A case study on optimizing transformers
Transformers are one of the most important machine learning workloads today. Training one
is a very compute-intensive task, often taking days or weeks, and significant attention has …
is a very compute-intensive task, often taking days or weeks, and significant attention has …
Reducing communication in graph neural network training
Graph Neural Networks (GNNs) are powerful and flexible neural networks that use the
naturally sparse connectivity information of the data. GNNs represent this connectivity as …
naturally sparse connectivity information of the data. GNNs represent this connectivity as …
Orchestrating the development lifecycle of machine learning-based IoT applications: A taxonomy and survey
Machine Learning (ML) and Internet of Things (IoT) are complementary advances: ML
techniques unlock the potential of IoT with intelligence, and IoT applications increasingly …
techniques unlock the potential of IoT with intelligence, and IoT applications increasingly …
Efficient combination of rematerialization and offloading for training dnns
O Beaumont, L Eyraud-Dubois… - Advances in Neural …, 2021 - proceedings.neurips.cc
Rematerialization and offloading are two well known strategies to save memory during the
training phase of deep neural networks, allowing data scientists to consider larger models …
training phase of deep neural networks, allowing data scientists to consider larger models …
Gems: Gpu-enabled memory-aware model-parallelism system for distributed dnn training
Data-parallelism has become an established paradigm to train DNNs that fit inside GPU
memory on large-scale HPC systems. However, model-parallelism is required to train out-of …
memory on large-scale HPC systems. However, model-parallelism is required to train out-of …
Channel and filter parallelism for large-scale CNN training
N Dryden, N Maruyama, T Moon, T Benson… - Proceedings of the …, 2019 - dl.acm.org
Accelerating large-scale CNN training is needed to keep training times reasonable as
datasets grow larger and models become more complex. Existing frameworks primarily …
datasets grow larger and models become more complex. Existing frameworks primarily …
Sequential aggregation and rematerialization: Distributed full-batch training of graph neural networks on large graphs
H Mostafa - Proceedings of Machine Learning and Systems, 2022 - proceedings.mlsys.org
Abstract We present the Sequential Aggregation and Rematerialization (SAR) scheme for
distributed full-batch training of Graph Neural Networks (GNNs) on large graphs. Large …
distributed full-batch training of Graph Neural Networks (GNNs) on large graphs. Large …