Log-based predictive maintenance
R Sipos, D Fradkin, F Moerchen, Z Wang - Proceedings of the 20th ACM …, 2014 - dl.acm.org
Success of manufacturing companies largely depends on reliability of their products.
Scheduled maintenance is widely used to ensure that equipment is operating correctly so as …
Scheduled maintenance is widely used to ensure that equipment is operating correctly so as …
A comprehensive survey of logging in software: From logging statements automation to log mining and analysis
S Gholamian, PAS Ward - arXiv preprint arXiv:2110.12489, 2021 - arxiv.org
Logs are widely used to record runtime information of software systems, such as the
timestamp and the importance of an event, the unique ID of the source of the log, and a part …
timestamp and the importance of an event, the unique ID of the source of the log, and a part …
[HTML][HTML] System log clustering approaches for cyber security applications: A survey
Log files give insight into the state of a computer system and enable the detection of
anomalous events relevant to cyber security. However, automatically analyzing log data is …
anomalous events relevant to cyber security. However, automatically analyzing log data is …
System log parsing: A survey
Modern information and communication systems have become increasingly challenging to
manage. The ubiquitous system logs contain plentiful information and are thus widely …
manage. The ubiquitous system logs contain plentiful information and are thus widely …
ClusterCockpit—A web application for job-specific performance monitoring
J Eitzinger, T Gruber, A Afzal, T Zeiser… - … Conference on Cluster …, 2019 - ieeexplore.ieee.org
Monitoring is a common component of HPC system software. Up to now, monitoring focused
mainly on health checking and system level performance as well as on job scheduler …
mainly on health checking and system level performance as well as on job scheduler …
[图书][B] Smart Log Data Analytics
Prudent event monitoring and logging are the only means that allow system operators and
security teams to truly understand how complex systems are utilized. Log data are essential …
security teams to truly understand how complex systems are utilized. Log data are essential …
A survey of log-correlation tools for failure diagnosis and prediction in cluster systems
System logs are the first source of information available to system designers to analyze and
troubleshoot their cluster systems. For example, High-Performance Computing (HPC) …
troubleshoot their cluster systems. For example, High-Performance Computing (HPC) …
Live forensics for HPC systems: A case study on distributed storage systems
Large-scale high-performance computing systems frequently experience a wide range of
failure modes, such as reliability failures (eg, hang or crash), and resource overload-related …
failure modes, such as reliability failures (eg, hang or crash), and resource overload-related …
[PDF][PDF] Enabling Advanced Operational Analysis Through Multi-subsystem Data Integration on Trinity.
JM Brandt, D DeBonis, AC Gentile, J Lujan, C Martin… - 2015 - osti.gov
Operations management of the ACES Trinity platform will rely on data from a variety of
sources including System Environment Data Collections (SEDC); node level information …
sources including System Environment Data Collections (SEDC); node level information …
Design and implementation of a scalable hpc monitoring system
S Sanchez, A Bonnie, G Van Heule… - 2016 IEEE …, 2016 - ieeexplore.ieee.org
Over the past decade, platforms at Los Alamos National Laboratory (LANL) have
experienced large increases in complexity and scale to reach computational targets. The …
experienced large increases in complexity and scale to reach computational targets. The …