Training of deep learning pipelines on memory-constrained gpus via segmented fused-tiled execution
Proceedings of the 31st ACM SIGPLAN International Conference on Compiler …, 2022•dl.acm.org
Training models with massive inputs is a significant challenge in the development of Deep
Learning pipelines to process very large digital image datasets as required by Whole Slide
Imaging (WSI) in computational pathology and analysis of brain fMRI images in
computational neuroscience. Graphics Processing Units (GPUs) represent the primary
workhorse in training and inference of Deep Learning models. In order to use GPUs to run
inference or training on a neural network pipeline, state-of-the-art machine learning …
Learning pipelines to process very large digital image datasets as required by Whole Slide
Imaging (WSI) in computational pathology and analysis of brain fMRI images in
computational neuroscience. Graphics Processing Units (GPUs) represent the primary
workhorse in training and inference of Deep Learning models. In order to use GPUs to run
inference or training on a neural network pipeline, state-of-the-art machine learning …
Training models with massive inputs is a significant challenge in the development of Deep Learning pipelines to process very large digital image datasets as required by Whole Slide Imaging (WSI) in computational pathology and analysis of brain fMRI images in computational neuroscience. Graphics Processing Units (GPUs) represent the primary workhorse in training and inference of Deep Learning models. In order to use GPUs to run inference or training on a neural network pipeline, state-of-the-art machine learning frameworks like PyTorch and TensorFlow currently require that the collective memory on the GPUs must be larger than the size of the activations at any stage in the pipeline. Therefore, existing Deep Learning pipelines for these use cases have been forced to develop sub-optimal "patch-based" modeling approaches, where images are processed in small segments of an image. In this paper, we present a solution to this problem by employing tiling in conjunction with check-pointing, thereby enabling arbitrarily large images to be directly processed, irrespective of the size of global memory on a GPU and the number of available GPUs. Experimental results using PyTorch demonstrate enhanced functionality/performance over existing frameworks.
ACM Digital Library
以上显示的是最相近的搜索结果。 查看全部搜索结果