CPU+ GPU programming of stencil computations for resource-efficient use of GPU clusters

M Sourouri, J Langguth, F Spiga… - 2015 IEEE 18th …, 2015 - ieeexplore.ieee.org
2015 IEEE 18th International Conference on Computational Science …, 2015ieeexplore.ieee.org
On modern GPU clusters, the role of the CPUs is often restricted to controlling the GPUs and
handling MPI communication. The unused computing power of the CPUs, however, can be
considerable for computations whose performance is bounded by memory traffic. This paper
investigates the challenges of simultaneous usage of CPUs and GPUs for computation. Our
emphasis is on deriving a heterogeneous CPU+ GPU programming approach that combines
MPI, OpenMP and CUDA. To effectively hide the overhead of various inter-and intra-node …
On modern GPU clusters, the role of the CPUs is often restricted to controlling the GPUs and handling MPI communication. The unused computing power of the CPUs, however, can be considerable for computations whose performance is bounded by memory traffic. This paper investigates the challenges of simultaneous usage of CPUs and GPUs for computation. Our emphasis is on deriving a heterogeneous CPU+GPU programming approach that combines MPI, OpenMP and CUDA. To effectively hide the overhead of various inter-and intra-node communications, a new level of task parallelism is introduced on top of the conventional data parallelism. Combined with a suitable workload division between the CPUs and GPUs, our CPU+GPU programming approach is able to fully utilize the different processing units. The programming details and achievable performance are exemplified by a widely used 3D 7-point stencil computation, which shows high performance and scaling in experiments using up to 64 CPU-GPU nodes.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果