Anomaly detection using autoencoders in high performance computing systems

A Borghesi, A Bartolini, M Lombardi, M Milano… - Proceedings of the …, 2019 - ojs.aaai.org
Anomaly detection in supercomputers is a very difficult problem due to the big scale of the
systems and the high number of components. The current state of the art for automated …

Counterfactual explanations for multivariate time series

E Ates, B Aksar, VJ Leung… - … conference on applied …, 2021 - ieeexplore.ieee.org
Multivariate time series are used in many science and engineering domains, including
health-care, astronomy, and high-performance computing. A recent trend is to use machine …

A semisupervised autoencoder-based approach for anomaly detection in high performance computing systems

A Borghesi, A Bartolini, M Lombardi, M Milano… - … Applications of Artificial …, 2019 - Elsevier
Abstract High Performance Computing (HPC) systems are complex machines with
heterogeneous components that can break or malfunction. Automated anomaly detection in …

Online anomaly detection in hpc systems

A Borghesi, A Libri, L Benini… - 2019 IEEE International …, 2019 - ieeexplore.ieee.org
Reliability is a cumbersome problem in High Performance Computing Systems and Data
Centers evolution. During operation, several types of fault conditions or anomalies can arise …

pAElla: Edge AI-Based Real-Time Malware Detection in Data Centers

A Libri, A Bartolini, L Benini - IEEE Internet of Things Journal, 2020 - ieeexplore.ieee.org
The increasing use of Internet-of-Things (IoT) devices for monitoring a wide spectrum of
applications, along with the challenges of “big data” streaming support they often require for …

Paving the way toward energy-aware and automated datacentre

A Bartolini, F Beneventi, A Borghesi… - … Proceedings of the …, 2019 - dl.acm.org
Energy efficiency and datacentre automation are critical targets of the research and
deployment agenda of CINECA and its research partners in the Energy Efficient System …

E2EWatch: an end-to-end anomaly diagnosis framework for production HPC systems

B Aksar, B Schwaller, O Aaziz, VJ Leung… - Euro-Par 2021: Parallel …, 2021 - Springer
Abstract In today's High-Performance Computing (HPC) systems, application performance
variations are among the most vital challenges as they adversely affect system efficiency …

Pricing schemes for energy-efficient HPC systems: Design and exploration

A Borghesi, A Bartolini, M Milano… - … International Journal of …, 2019 - journals.sagepub.com
Energy efficiency is of paramount importance for the sustainability of high performance
computing (HPC) systems. Energy consumption limits the peak performance of …

Lynsyn and LynsynLite: The STHEM power measurement units

A Djupdal, B Gottschall, F Ghasemi, M Jahre - Towards Ubiquitous Low …, 2021 - Springer
The end of Dennard scaling has resulted in power or energy consumption becoming first-
order design constraints of virtually every computer system. A key challenge is to attribute …

Metricq: A scalable infrastructure for processing high-resolution time series data

T Ilsche, D Hackenberg, R Schöne… - 2019 IEEE/ACM …, 2019 - ieeexplore.ieee.org
In this paper we present MetricQ, a novel infrastructure for collecting, archiving, and
analyzing sensor data. Core components of MetricQ are a scalable message broker based …