Aluminum: An asynchronous, GPU-aware communication library optimized for large-scale training...

S Mittal, S Vaishay - Journal of Systems Architecture, 2019 - Elsevier

The rise of deep-learning (DL) has been fuelled by the improvements in accelerators. Due to
its unique features, the GPU continues to remain the most widely used accelerator for DL …

被引用次数：181 相关文章所有 5 个版本

Automated design of deep neural networks: A survey and unified taxonomy

EG Talbi - ACM Computing Surveys (CSUR), 2021 - dl.acm.org

In recent years, research in applying optimization approaches in the automatic design of
deep neural networks has become increasingly popular. Although various approaches have …

被引用次数：41 相关文章所有 4 个版本

[PDF] neurips.cc

Pipe-SGD: A decentralized pipelined SGD framework for distributed deep net training

Y Li, M Yu, S Li, S Avestimehr… - Advances in Neural …, 2018 - proceedings.neurips.cc

Distributed training of deep nets is an important technique to address some of the present
day computing challenges like memory consumption and computational demands. Classical …

被引用次数：118 相关文章所有 8 个版本

[PDF] arxiv.org

Clairvoyant prefetching for distributed machine learning I/O

N Dryden, R Böhringer, T Ben-Nun… - Proceedings of the …, 2021 - dl.acm.org

I/O is emerging as a major bottleneck for machine learning training, especially in distributed
environments. Indeed, at large scale, I/O takes as much as 85% of training time. Addressing …

被引用次数：51 相关文章所有 22 个版本

[PDF] sciencedirect.com

Toward performance-portable PETSc for GPU-based exascale systems

RT Mills, MF Adams, S Balay, J Brown, A Dener… - Parallel Computing, 2021 - Elsevier

Abstract The Portable Extensible Toolkit for Scientific computation (PETSc) library delivers
scalable solvers for nonlinear time-dependent differential and algebraic equations and for …

被引用次数：53 相关文章所有 13 个版本

[PDF] arxiv.org

Communication optimization strategies for distributed deep neural network training: A survey

S Ouyang, D Dong, Y Xu, L Xiao - Journal of Parallel and Distributed …, 2021 - Elsevier

Recent trends in high-performance computing and deep learning have led to the
proliferation of studies on large-scale deep neural network training. However, the frequent …

被引用次数：54 相关文章所有 4 个版本

[PDF] usenix.org

Towards {Domain-Specific} Network Transport for Distributed {DNN} Training

H Wang, H Tian, J Chen, X Wan, J Xia, G Zeng… - … USENIX Symposium on …, 2024 - usenix.org

The nature of machine learning (ML) applications exposes rich characteristics to underlying
network transport, yet little work has been done so far to systematically exploit these …

被引用次数：4 相关文章所有 6 个版本

[PDF] acm.org

Channel and filter parallelism for large-scale CNN training

N Dryden, N Maruyama, T Moon, T Benson… - Proceedings of the …, 2019 - dl.acm.org

Accelerating large-scale CNN training is needed to keep training times reasonable as
datasets grow larger and models become more complex. Existing frameworks primarily …

被引用次数：53 相关文章所有 8 个版本

[PDF] ieee.org

The PetscSF scalable communication layer

J Zhang, J Brown, S Balay… - … on Parallel and …, 2021 - ieeexplore.ieee.org

PetscSF, the communication component of the Portable, Extensible Toolkit for Scientific
Computation (PETSc), is designed to provide PETSc's communication infrastructure suitable …

被引用次数：37 相关文章所有 10 个版本

[HTML] sagepub.com

Enabling rapid COVID-19 small molecule drug design through scalable deep learning of generative models

SA Jacobs, T Moon, K McLoughlin… - … Journal of High …, 2021 - journals.sagepub.com

We improved the quality and reduced the time to produce machine learned models for use
in small molecule antiviral design. Our globally asynchronous multi-level parallel training …

被引用次数：32 相关文章所有 3 个版本