Training of deep learning pipelines on memory-constrained gpus via segmented fused-tiled execution- 学术资源搜索

Training of deep learning pipelines on memory-constrained gpus via segmented fused-tiled execution

Y Xu, S Raje, A Rountev, G Sabin… - Proceedings of the 31st …, 2022 - dl.acm.org

Y Xu, S Raje, A Rountev, G Sabin, A Sukumaran-Rajam, P Sadayappan

Proceedings of the 31st ACM SIGPLAN International Conference on Compiler …, 2022•dl.acm.org

Training models with massive inputs is a significant challenge in the development of Deep Learning pipelines to process very large digital image datasets as required by Whole Slide Imaging (WSI) in computational pathology and analysis of brain fMRI images in computational neuroscience. Graphics Processing Units (GPUs) represent the primary workhorse in training and inference of Deep Learning models. In order to use GPUs to run inference or training on a neural network pipeline, state-of-the-art machine learning frameworks like PyTorch and TensorFlow currently require that the collective memory on the GPUs must be larger than the size of the activations at any stage in the pipeline. Therefore, existing Deep Learning pipelines for these use cases have been forced to develop sub-optimal "patch-based" modeling approaches, where images are processed in small segments of an image. In this paper, we present a solution to this problem by employing tiling in conjunction with check-pointing, thereby enabling arbitrarily large images to be directly processed, irrespective of the size of global memory on a GPU and the number of available GPUs. Experimental results using PyTorch demonstrate enhanced functionality/performance over existing frameworks.

ACM Digital Library

展开收起

被引用次数：4 相关文章所有 7 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果