Online diagnosis of performance variation in HPC systems using machine learning

O Tuncer, E Ates, Y Zhang, A Turk… - … on Parallel and …, 2018 - ieeexplore.ieee.org
As the size and complexity of high performance computing (HPC) systems grow in line with
advancements in hardware and software technology, HPC systems increasingly suffer from …

Stochastic conformal anomaly detection and resolution for air traffic control

HC Choi, C Deng, H Park, I Hwang - Transportation Research Part C …, 2023 - Elsevier
Safety is of the utmost importance in the air traffic system. In recent years, data-driven
algorithms have emerged to identify anomalous and potentially unsafe operations based on …

InstantOps: A Joint Approach to System Failure Prediction and Root Cause Identification in Microserivces Cloud-Native Applications

R Rouf, M Rasolroveicy, M Litoiu, S Nagar… - Proceedings of the 15th …, 2024 - dl.acm.org
As microservice and cloud computing operations increasingly adopt automation, the
importance of models for fostering resilient and efficient adaptive architectures becomes …

Albadross: Active learning based anomaly diagnosis for production hpc systems

B Aksar, E Sencan, B Schwaller, O Aaziz… - 2022 IEEE …, 2022 - ieeexplore.ieee.org
Diagnosing causes of performance variations in High-Performance Computing (HPC)
systems is a daunting chal-lenge due to the systems' scale and complexity. Variations in …

Method and system for performing effective orchestration of cognitive functions in distributed heterogeneous communication network

R Ravichandran, KM Manoharan… - US Patent …, 2022 - Google Patents
This disclosure relates to method and system for performing effective orchestration of
cognitive functions (CFs) in a distributed heterogeneous communication network. In one …

E2EWatch: an end-to-end anomaly diagnosis framework for production HPC systems

B Aksar, B Schwaller, O Aaziz, VJ Leung… - Euro-Par 2021: Parallel …, 2021 - Springer
Abstract In today's High-Performance Computing (HPC) systems, application performance
variations are among the most vital challenges as they adversely affect system efficiency …

Anomaly detection in the context of long-term cloud resource usage planning

P Nawrocki, W Sus - Knowledge and Information Systems, 2022 - Springer
This paper describes a new approach to automatic long-term cloud resource usage
planning with a novel hybrid anomaly detection mechanism. It analyzes existing anomaly …

Log anomaly to resolution: Ai based proactive incident remediation

R Mahindru, H Kumar, S Bansal - 2021 36th IEEE/ACM …, 2021 - ieeexplore.ieee.org
Based on 2020 SRE report, 80% of SREs work on postmortem analysis of incidents due to
lack of provided information and 16% of toil come from investigating false …

Anomaly detection in scientific datasets using sparse representation

A Moon, M Kim, J Chen, SW Son - Proceedings of the First Workshop on …, 2023 - dl.acm.org
As the size and complexity of high-performance computing (HPC) systems keep growing,
scientists' ability to trust the data produced is paramount due to potential data corruption for …

Cloud-based autonomic computing framework for securing SCADA systems

S Nazir, S Patel, D Patel - Innovations, algorithms, and applications in …, 2020 - igi-global.com
This chapter proposes an autonomic computing security framework for protecting cloud-
based supervisory control and data acquisition (SCADA) systems against cyber threats …