Performance modeling of gyrokinetic toroidal simulations for a many-tasking runtime system

M Anderson, M Brodowicz, A Kulkarni… - … and Simulation: 4th …, 2014 - Springer
High Performance Computing Systems. Performance Modeling, Benchmarking and …, 2014Springer
Conventional programming practices on multicore processors in high performance
computing architectures are not universally effective in terms of efficiency and scalability for
many algorithms in scientific computing. One possible solution for improving efficiency and
scalability in applications on this class of machines is the use of a many-tasking runtime
system employing many lightweight, concurrent threads. Yet a priori estimation of the
potential performance and scalability impact of such runtime systems on existing …
Abstract
Conventional programming practices on multicore processors in high performance computing architectures are not universally effective in terms of efficiency and scalability for many algorithms in scientific computing. One possible solution for improving efficiency and scalability in applications on this class of machines is the use of a many-tasking runtime system employing many lightweight, concurrent threads. Yet a priori estimation of the potential performance and scalability impact of such runtime systems on existing applications developed around the bulk synchronous parallel (BSP) model is not well understood. In this work, we present a case study of a BSP particle-in-cell benchmark code which has been ported to a many-tasking runtime system. The 3-D Gyrokinetic Toroidal code (GTC) is examined in its original MPI form and compared with a port to the High Performance ParalleX 3 (HPX-3) runtime system. Phase overlap, oversubscription behavior, and work rebalancing in the implementation are explored. Results for GTC using the SST/macro simulator complement the implementation results. Finally, an analytic performance model for GTC is presented in order to guide future implementation efforts.
Springer
以上显示的是最相近的搜索结果。 查看全部搜索结果