Methods to Manage Data in Self-healing Systems
A Kovalenko, H Kuchuk - Advances in Self-healing Systems Monitoring …, 2022 - Springer
The chapter proposes a set of data management methods in Self-healing Systems. The
proposed methods are focused on taking into account the features of Self-healing Systems …
proposed methods are focused on taking into account the features of Self-healing Systems …
Reinit: Evaluating the performance of global-restart recovery methods for mpi fault tolerance
Scaling supercomputers comes with an increase in failure rates due to the increasing
number of hardware components. In standard practice, applications are made resilient …
number of hardware components. In standard practice, applications are made resilient …
Failure detection and propagation in HPC systems
Building an infrastructure for Exascale applications requires, in addition to many other key
components, a stable and efficient failure detector. This paper describes the design and …
components, a stable and efficient failure detector. This paper describes the design and …
Decentralized network building change in large manufacturing companies towards Industry 4.0
In complex industrial ecosystems together with an increasing global competition, success
depends on a complete value chain transformation. The use of Industry 4.0 standards is …
depends on a complete value chain transformation. The use of Industry 4.0 standards is …
[HTML][HTML] A survey about self-healing systems (desktop and web application)
AA Hudaib, HN Fakhouri, FE Al Adwan… - Communications and …, 2017 - scirp.org
The complexity of computer architectures, software, web applications, and its large spread
worldwide using the internet and the rapid increase in the number of users in companion …
worldwide using the internet and the rapid increase in the number of users in companion …
A failure detector for HPC platforms
Building an infrastructure for exascale applications requires, in addition to many other key
components, a stable and efficient failure detector. This article describes the design and …
components, a stable and efficient failure detector. This article describes the design and …
Epidemic failure detection and consensus for extreme parallelism
Future extreme-scale high-performance computing systems will be required to work under
frequent component failures. The MPI Forum's User Level Failure Mitigation proposal has …
frequent component failures. The MPI Forum's User Level Failure Mitigation proposal has …
A short survey of dimensionality reduction techniques
Advancement in data collection has increased the availability of high-dimensional data.
High dimensional data results in data overload which makes the storage and processing …
High dimensional data results in data overload which makes the storage and processing …
Match: An mpi fault tolerance benchmark suite
MPI has been ubiquitously deployed in flagship HPC systems aiming to accelerate
distributed scientific applications running on tens of hundreds of processes and compute …
distributed scientific applications running on tens of hundreds of processes and compute …
A Fault-Model-Relevant Classification of Consensus Mechanisms for MPI and HPC
Large-scale HPC systems experience failures arising from faults in hardware, software,
and/or networking. Failure rates continue to grow as systems scale up and out. Crash fault …
and/or networking. Failure rates continue to grow as systems scale up and out. Crash fault …