Improving strong-scaling of CNN training by exploiting finer-grained parallelism

T Ben-Nun, T Hoefler - ACM Computing Surveys (CSUR), 2019 - dl.acm.org

Deep Neural Networks (DNNs) are becoming an important tool in modern computing
applications. Accelerating their training is a major challenge and techniques range from …

被引用次数：806 相关文章所有 28 个版本

[HTML] sciencedirect.com

[HTML][HTML] Optimization and acceleration of convolutional neural networks: A survey

G Habib, S Qureshi - Journal of King Saud University-Computer and …, 2022 - Elsevier

Convolutional neural networks (CNN) is a specialized case of artificial neural networks
(ANN) and finds its application in computer vision and parallel distributed computing for …

被引用次数：116 相关文章所有 2 个版本

[PDF] mdpi.com

Convolutional neural networks: A survey

M Krichen - Computers, 2023 - mdpi.com

Artificial intelligence (AI) has become a cornerstone of modern technology, revolutionizing
industries from healthcare to finance. Convolutional neural networks (CNNs) are a subset of …

被引用次数：173 相关文章所有 4 个版本

[PDF] mlsys.org

Data movement is all you need: A case study on optimizing transformers

A Ivanov, N Dryden, T Ben-Nun, S Li… - … of Machine Learning …, 2021 - proceedings.mlsys.org

Transformers are one of the most important machine learning workloads today. Training one
is a very compute-intensive task, often taking days or weeks, and significant attention has …

被引用次数：125 相关文章所有 20 个版本

[PDF] arxiv.org

Reducing communication in graph neural network training

A Tripathy, K Yelick, A Buluç - SC20: International Conference …, 2020 - ieeexplore.ieee.org

Graph Neural Networks (GNNs) are powerful and flexible neural networks that use the
naturally sparse connectivity information of the data. GNNs represent this connectivity as …

被引用次数：107 相关文章所有 12 个版本

[PDF] arxiv.org

Orchestrating the development lifecycle of machine learning-based IoT applications: A taxonomy and survey

B Qian, J Su, Z Wen, DN Jha, Y Li, Y Guan… - ACM Computing …, 2020 - dl.acm.org

Machine Learning (ML) and Internet of Things (IoT) are complementary advances: ML
techniques unlock the potential of IoT with intelligence, and IoT applications increasingly …

被引用次数：95 相关文章所有 11 个版本

[PDF] neurips.cc

Efficient combination of rematerialization and offloading for training dnns

O Beaumont, L Eyraud-Dubois… - Advances in Neural …, 2021 - proceedings.neurips.cc

Rematerialization and offloading are two well known strategies to save memory during the
training phase of deep neural networks, allowing data scientists to consider larger models …

被引用次数：38 相关文章所有 9 个版本

[PDF] nsf.gov

Gems: Gpu-enabled memory-aware model-parallelism system for distributed dnn training

A Jain, AA Awan, AM Aljuhani… - … Conference for High …, 2020 - ieeexplore.ieee.org

Data-parallelism has become an established paradigm to train DNNs that fit inside GPU
memory on large-scale HPC systems. However, model-parallelism is required to train out-of …

被引用次数：50 相关文章所有 5 个版本

[PDF] acm.org

Channel and filter parallelism for large-scale CNN training

N Dryden, N Maruyama, T Moon, T Benson… - Proceedings of the …, 2019 - dl.acm.org

Accelerating large-scale CNN training is needed to keep training times reasonable as
datasets grow larger and models become more complex. Existing frameworks primarily …

被引用次数：53 相关文章所有 8 个版本

[PDF] mlsys.org

Sequential aggregation and rematerialization: Distributed full-batch training of graph neural networks on large graphs

H Mostafa - Proceedings of Machine Learning and Systems, 2022 - proceedings.mlsys.org

Abstract We present the Sequential Aggregation and Rematerialization (SAR) scheme for
distributed full-batch training of Graph Neural Networks (GNNs) on large graphs. Large …

被引用次数：21 相关文章所有 3 个版本