Fault-tolerance in the scope of cloud computing
Fault-tolerance methods are required to ensure high availability and high reliability in cloud
computing environments. In this survey, we address fault-tolerance in the scope of cloud …
computing environments. In this survey, we address fault-tolerance in the scope of cloud …
Fault-tolerance in the scope of software-defined networking (sdn)
Fault-tolerance is an essential aspect of network resilience. Fault-tolerance mechanisms are
required to ensure high availability and high reliability in systems. The advent of software …
required to ensure high availability and high reliability in systems. The advent of software …
Resiliency in numerical algorithm design for extreme scale simulations
This work is based on the seminar titled 'Resiliency in Numerical Algorithm Design for
Extreme Scale Simulations' held March 1–6, 2020, at Schloss Dagstuhl, that was attended …
Extreme Scale Simulations' held March 1–6, 2020, at Schloss Dagstuhl, that was attended …
Resilience design patterns: A structured approach to resilience at extreme scale
S Hukerikar, C Engelmann - arXiv preprint arXiv:1708.07422, 2017 - arxiv.org
Reliability is a serious concern for future extreme-scale high-performance computing (HPC)
systems. While the HPC community has developed various resilience solutions, the solution …
systems. While the HPC community has developed various resilience solutions, the solution …
Resilience in the Cyberworld: Definitions, Features and Models
E Vogel, Z Dyka, D Klann, P Langendörfer - Future Internet, 2021 - mdpi.com
Resilience is a feature that is gaining more and more attention in computer science and
computer engineering. However, the definition of resilience for the cyber landscape …
computer engineering. However, the definition of resilience for the cyber landscape …
The INTERSECT open federated architecture for the laboratory of the future
A federated instrument-to-edge-to-center architecture is needed to autonomously collect,
transfer, store, process, curate, and archive scientific data and reduce human-in-the-loop …
transfer, store, process, curate, and archive scientific data and reduce human-in-the-loop …
A pattern language for high-performance computing resilience
S Hukerikar, C Engelmann - … of the 22nd European Conference on …, 2017 - dl.acm.org
High-performance computing systems (HPC) provide powerful capabilities for modeling,
simulation, and data analytics for a broad class of computational problems. They enable …
simulation, and data analytics for a broad class of computational problems. They enable …
INTERSECT Architecture Specification: Use Case Design Patterns (Version 0.9)
C Engelmann, S Somnath - 2023 - osti.gov
Connecting scientific instruments and robot-controlled laboratories with computing and data
resources at the edge, the Cloud or the high-performance computing (HPC) center enables …
resources at the edge, the Cloud or the high-performance computing (HPC) center enables …
Science Use Case Design Patterns for Autonomous Experiments
C Engelmann, S Somnath - … of the 28th European Conference on Pattern …, 2023 - dl.acm.org
Connecting scientific instruments and robot-controlled laboratories with computing and data
resources at the edge, the Cloud or the high-performance computing (HPC) center enables …
resources at the edge, the Cloud or the high-performance computing (HPC) center enables …
Pattern-based modeling of multiresilience solutions for high-performance computing
Resiliency is the ability of large-scale high-performance computing (HPC) applications to
gracefully handle errors, and recover from failures. In this paper, we propose a pattern …
gracefully handle errors, and recover from failures. In this paper, we propose a pattern …