Application level fault recovery: Using Fault-Tolerant Open MPI in a PDE solver

MM Ali, J Southern, P Strazdins… - 2014 IEEE International …, 2014 - ieeexplore.ieee.org
A fault-tolerant version of Open Message Passing Interface (Open MPI), based on the draft
User Level Failure Mitigation (ULFM) proposal of the MPI Forum's Fault Tolerance Working …

Recent developments in the theory and application of the sparse grid combination technique

M Hegland, B Harding, C Kowitz, D Pflüger… - Software for Exascale …, 2016 - Springer
Substantial modifications of both the choice of the grids, the combination coefficients, the
parallel data structures and the algorithms used for the combination technique lead to …

Complex scientific applications made fault-tolerant with the sparse grid combination technique

MM Ali, PE Strazdins, B Harding… - … International Journal of …, 2016 - journals.sagepub.com
Ultra-large–scale simulations via solving partial differential equations (PDEs) require very
large computational systems for their timely solution. Studies shown the rate of failure grows …

The sparse grid combination technique for computing eigenvalues in linear gyrokinetics

C Kowitz, M Hegland - Procedia Computer Science, 2013 - Elsevier
Using the five-dimensional gyrokinetic equations for simulations of hot fusion plasmas
requires discretizations with a lot degrees of freedom due to the curse of dimensionality. The …

Fault tolerant computation with the sparse grid combination technique

B Harding, M Hegland, J Larson, J Southern - SIAM Journal on Scientific …, 2015 - SIAM
This paper continues to develop a fault tolerant extension of the sparse grid combination
technique recently proposed in [B. Harding and M. Hegland, ANZIAM J. Electron. Suppl., 54 …

EXAHD: an exa-scalable two-level sparse grid approach for higher-dimensional problems in plasma physics and beyond

D Pflüger, HJ Bungartz, M Griebel, F Jenko… - Euro-Par 2014: Parallel …, 2014 - Springer
High-dimensional problems pose a challenge for tomorrow's supercomputing. Problems that
require the joint discretization of more dimensions than space and time are among the most …

[PDF][PDF] A massively parallel combination technique for the solution of high-dimensional PDEs

M Heene - 2018 - core.ac.uk
The solution of high-dimensional problems, especially high-dimensional partial differential
equations (PDEs) that require the joint discretization of more than the usual three spatial …

Towards a fault-tolerant, scalable implementation of GENE

AP Hinojosa, C Kowitz, M Heene, D Pflüger… - Recent Trends in …, 2015 - Springer
We consider the HPC challenge of fault tolerance in the context of plasma physics
simulations using the sparse grid combination technique. In the combination technique …

A resilient and efficient CFD framework: Statistical learning tools for multi-fidelity and heterogeneous information fusion

S Lee, IG Kevrekidis, GE Karniadakis - Journal of Computational Physics, 2017 - Elsevier
Exascale-level simulations require fault-resilient algorithms that are robust against repeated
and expected software and/or hardware failures during computations, which may render the …

Global communication schemes for the numerical solution of high-dimensional PDEs

P Hupp, M Heene, R Jacob, D Pflüger - Parallel Computing, 2016 - Elsevier
The numerical treatment of high-dimensional partial differential equations is among the most
compute-hungry problems and in urgent need for current and future high-performance …