A survey of techniques for optimizing deep learning on GPUs
The rise of deep-learning (DL) has been fuelled by the improvements in accelerators. Due to
its unique features, the GPU continues to remain the most widely used accelerator for DL …
its unique features, the GPU continues to remain the most widely used accelerator for DL …
Automated design of deep neural networks: A survey and unified taxonomy
EG Talbi - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
In recent years, research in applying optimization approaches in the automatic design of
deep neural networks has become increasingly popular. Although various approaches have …
deep neural networks has become increasingly popular. Although various approaches have …
Pipe-SGD: A decentralized pipelined SGD framework for distributed deep net training
Distributed training of deep nets is an important technique to address some of the present
day computing challenges like memory consumption and computational demands. Classical …
day computing challenges like memory consumption and computational demands. Classical …
Clairvoyant prefetching for distributed machine learning I/O
I/O is emerging as a major bottleneck for machine learning training, especially in distributed
environments. Indeed, at large scale, I/O takes as much as 85% of training time. Addressing …
environments. Indeed, at large scale, I/O takes as much as 85% of training time. Addressing …
Toward performance-portable PETSc for GPU-based exascale systems
Abstract The Portable Extensible Toolkit for Scientific computation (PETSc) library delivers
scalable solvers for nonlinear time-dependent differential and algebraic equations and for …
scalable solvers for nonlinear time-dependent differential and algebraic equations and for …
Communication optimization strategies for distributed deep neural network training: A survey
Recent trends in high-performance computing and deep learning have led to the
proliferation of studies on large-scale deep neural network training. However, the frequent …
proliferation of studies on large-scale deep neural network training. However, the frequent …
Towards {Domain-Specific} Network Transport for Distributed {DNN} Training
The nature of machine learning (ML) applications exposes rich characteristics to underlying
network transport, yet little work has been done so far to systematically exploit these …
network transport, yet little work has been done so far to systematically exploit these …
Channel and filter parallelism for large-scale CNN training
N Dryden, N Maruyama, T Moon, T Benson… - Proceedings of the …, 2019 - dl.acm.org
Accelerating large-scale CNN training is needed to keep training times reasonable as
datasets grow larger and models become more complex. Existing frameworks primarily …
datasets grow larger and models become more complex. Existing frameworks primarily …
The PetscSF scalable communication layer
PetscSF, the communication component of the Portable, Extensible Toolkit for Scientific
Computation (PETSc), is designed to provide PETSc's communication infrastructure suitable …
Computation (PETSc), is designed to provide PETSc's communication infrastructure suitable …
Enabling rapid COVID-19 small molecule drug design through scalable deep learning of generative models
SA Jacobs, T Moon, K McLoughlin… - … Journal of High …, 2021 - journals.sagepub.com
We improved the quality and reduced the time to produce machine learned models for use
in small molecule antiviral design. Our globally asynchronous multi-level parallel training …
in small molecule antiviral design. Our globally asynchronous multi-level parallel training …