Anomaly detection using autoencoders in high performance computing systems
Anomaly detection in supercomputers is a very difficult problem due to the big scale of the
systems and the high number of components. The current state of the art for automated …
systems and the high number of components. The current state of the art for automated …
Counterfactual explanations for multivariate time series
Multivariate time series are used in many science and engineering domains, including
health-care, astronomy, and high-performance computing. A recent trend is to use machine …
health-care, astronomy, and high-performance computing. A recent trend is to use machine …
A semisupervised autoencoder-based approach for anomaly detection in high performance computing systems
Abstract High Performance Computing (HPC) systems are complex machines with
heterogeneous components that can break or malfunction. Automated anomaly detection in …
heterogeneous components that can break or malfunction. Automated anomaly detection in …
Online anomaly detection in hpc systems
Reliability is a cumbersome problem in High Performance Computing Systems and Data
Centers evolution. During operation, several types of fault conditions or anomalies can arise …
Centers evolution. During operation, several types of fault conditions or anomalies can arise …
pAElla: Edge AI-Based Real-Time Malware Detection in Data Centers
The increasing use of Internet-of-Things (IoT) devices for monitoring a wide spectrum of
applications, along with the challenges of “big data” streaming support they often require for …
applications, along with the challenges of “big data” streaming support they often require for …
Paving the way toward energy-aware and automated datacentre
Energy efficiency and datacentre automation are critical targets of the research and
deployment agenda of CINECA and its research partners in the Energy Efficient System …
deployment agenda of CINECA and its research partners in the Energy Efficient System …
E2EWatch: an end-to-end anomaly diagnosis framework for production HPC systems
Abstract In today's High-Performance Computing (HPC) systems, application performance
variations are among the most vital challenges as they adversely affect system efficiency …
variations are among the most vital challenges as they adversely affect system efficiency …
Pricing schemes for energy-efficient HPC systems: Design and exploration
Energy efficiency is of paramount importance for the sustainability of high performance
computing (HPC) systems. Energy consumption limits the peak performance of …
computing (HPC) systems. Energy consumption limits the peak performance of …
Lynsyn and LynsynLite: The STHEM power measurement units
The end of Dennard scaling has resulted in power or energy consumption becoming first-
order design constraints of virtually every computer system. A key challenge is to attribute …
order design constraints of virtually every computer system. A key challenge is to attribute …
Metricq: A scalable infrastructure for processing high-resolution time series data
In this paper we present MetricQ, a novel infrastructure for collecting, archiving, and
analyzing sensor data. Core components of MetricQ are a scalable message broker based …
analyzing sensor data. Core components of MetricQ are a scalable message broker based …