Fault-tolerance in the scope of cloud computing

AU Rehman, RL Aguiar, JP Barraca - IEEE Access, 2022 - ieeexplore.ieee.org
Fault-tolerance methods are required to ensure high availability and high reliability in cloud
computing environments. In this survey, we address fault-tolerance in the scope of cloud …

Fault-tolerance in the scope of software-defined networking (sdn)

AU Rehman, RL Aguiar, JP Barraca - IEEE access, 2019 - ieeexplore.ieee.org
Fault-tolerance is an essential aspect of network resilience. Fault-tolerance mechanisms are
required to ensure high availability and high reliability in systems. The advent of software …

Resiliency in numerical algorithm design for extreme scale simulations

E Agullo, M Altenbernd, H Anzt… - … Journal of High …, 2022 - journals.sagepub.com
This work is based on the seminar titled 'Resiliency in Numerical Algorithm Design for
Extreme Scale Simulations' held March 1–6, 2020, at Schloss Dagstuhl, that was attended …

Resilience design patterns: A structured approach to resilience at extreme scale

S Hukerikar, C Engelmann - arXiv preprint arXiv:1708.07422, 2017 - arxiv.org
Reliability is a serious concern for future extreme-scale high-performance computing (HPC)
systems. While the HPC community has developed various resilience solutions, the solution …

Resilience in the Cyberworld: Definitions, Features and Models

E Vogel, Z Dyka, D Klann, P Langendörfer - Future Internet, 2021 - mdpi.com
Resilience is a feature that is gaining more and more attention in computer science and
computer engineering. However, the definition of resilience for the cyber landscape …

The INTERSECT open federated architecture for the laboratory of the future

C Engelmann, O Kuchar, S Boehm, MJ Brim… - Smoky Mountains …, 2022 - Springer
A federated instrument-to-edge-to-center architecture is needed to autonomously collect,
transfer, store, process, curate, and archive scientific data and reduce human-in-the-loop …

A pattern language for high-performance computing resilience

S Hukerikar, C Engelmann - … of the 22nd European Conference on …, 2017 - dl.acm.org
High-performance computing systems (HPC) provide powerful capabilities for modeling,
simulation, and data analytics for a broad class of computational problems. They enable …

INTERSECT Architecture Specification: Use Case Design Patterns (Version 0.9)

C Engelmann, S Somnath - 2023 - osti.gov
Connecting scientific instruments and robot-controlled laboratories with computing and data
resources at the edge, the Cloud or the high-performance computing (HPC) center enables …

Science Use Case Design Patterns for Autonomous Experiments

C Engelmann, S Somnath - … of the 28th European Conference on Pattern …, 2023 - dl.acm.org
Connecting scientific instruments and robot-controlled laboratories with computing and data
resources at the edge, the Cloud or the high-performance computing (HPC) center enables …

Pattern-based modeling of multiresilience solutions for high-performance computing

RA Ashraf, S Hukerikar, C Engelmann - Proceedings of the 2018 ACM …, 2018 - dl.acm.org
Resiliency is the ability of large-scale high-performance computing (HPC) applications to
gracefully handle errors, and recover from failures. In this paper, we propose a pattern …