[PDF][PDF] Tensor contraction on distributed hybrid architectures using a task-based runtime system
2018•icl.utk.edu
The needs for predictive simulation of electronic structure in chemistry and materials science
calls for fast/reduced-scaling formulations of quantum n-body methods that replace the
traditional dense tensors with element-, block-, rank-, and block-rank-sparse (data-sparse)
tensors. The resulting, highly irregular data structures are a poor match to imperative, bulk-
synchronous parallel programming style due to the dynamic nature of the problem and to the
lack of clear domain decomposition to guarantee a fair load-balance. TESSE runtime and …
calls for fast/reduced-scaling formulations of quantum n-body methods that replace the
traditional dense tensors with element-, block-, rank-, and block-rank-sparse (data-sparse)
tensors. The resulting, highly irregular data structures are a poor match to imperative, bulk-
synchronous parallel programming style due to the dynamic nature of the problem and to the
lack of clear domain decomposition to guarantee a fair load-balance. TESSE runtime and …
Abstract
The needs for predictive simulation of electronic structure in chemistry and materials science calls for fast/reduced-scaling formulations of quantum n-body methods that replace the traditional dense tensors with element-, block-, rank-, and block-rank-sparse (data-sparse) tensors. The resulting, highly irregular data structures are a poor match to imperative, bulk-synchronous parallel programming style due to the dynamic nature of the problem and to the lack of clear domain decomposition to guarantee a fair load-balance. TESSE runtime and the associated programming model aim to support performance-portable composition of applications involving irregular and dynamically changing data. In this paper we report an implementation of irregular dense tensor contraction in a paradigmatic electronic structure application based on the TESSE extension of PaRSEC, a distributed hybrid task runtime system, and analyze the resulting performance on a distributedmemory cluster of multi-GPU nodes. Unprecedented strong scaling and promising efficiency indicate a viable future for taskbased programming of complete production-quality reducedscaling models of electronic structure.
icl.utk.edu
以上显示的是最相近的搜索结果。 查看全部搜索结果