Baler: deterministic, lossless log message clustering tool

R Sipos, D Fradkin, F Moerchen, Z Wang - Proceedings of the 20th ACM …, 2014 - dl.acm.org

Success of manufacturing companies largely depends on reliability of their products.
Scheduled maintenance is widely used to ensure that equipment is operating correctly so as …

被引用次数：288 相关文章所有 9 个版本

[PDF] arxiv.org

A comprehensive survey of logging in software: From logging statements automation to log mining and analysis

S Gholamian, PAS Ward - arXiv preprint arXiv:2110.12489, 2021 - arxiv.org

Logs are widely used to record runtime information of software systems, such as the
timestamp and the importance of an event, the unique ID of the source of the log, and a part …

被引用次数：21 相关文章所有 2 个版本

[HTML] sciencedirect.com

[HTML][HTML] System log clustering approaches for cyber security applications: A survey

M Landauer, F Skopik, M Wurzenberger, A Rauber - Computers & Security, 2020 - Elsevier

Log files give insight into the state of a computer system and enable the detection of
anomalous events relevant to cyber security. However, automatically analyzing log data is …

被引用次数：123 相关文章所有 9 个版本

[PDF] arxiv.org

System log parsing: A survey

T Zhang, H Qiu, G Castellano, M Rifai… - … on Knowledge and …, 2023 - ieeexplore.ieee.org

Modern information and communication systems have become increasingly challenging to
manage. The ubiquitous system logs contain plentiful information and are thus widely …

被引用次数：52 相关文章所有 7 个版本

ClusterCockpit—A web application for job-specific performance monitoring

J Eitzinger, T Gruber, A Afzal, T Zeiser… - … Conference on Cluster …, 2019 - ieeexplore.ieee.org

Monitoring is a common component of HPC system software. Up to now, monitoring focused
mainly on health checking and system level performance as well as on job scheduler …

被引用次数：32 相关文章所有 2 个版本

[图书][B] Smart Log Data Analytics

F Skopik, M Wurzenberger, M Landauer - 2021 - Springer

Prudent event monitoring and logging are the only means that allow system operators and
security teams to truly understand how complex systems are utilized. Log data are essential …

被引用次数：18 相关文章所有 6 个版本

[PDF] ieee.org

A survey of log-correlation tools for failure diagnosis and prediction in cluster systems

E Chuah, A Jhumka, M Malek, N Suri - IEEE Access, 2022 - ieeexplore.ieee.org

System logs are the first source of information available to system designers to analyze and
troubleshoot their cluster systems. For example, High-Performance Computing (HPC) …

被引用次数：2 相关文章所有 6 个版本

[PDF] nsf.gov

Live forensics for HPC systems: A case study on distributed storage systems

S Jha, S Cui, SS Banerjee, T Xu, J Enos… - … Conference for High …, 2020 - ieeexplore.ieee.org

Large-scale high-performance computing systems frequently experience a wide range of
failure modes, such as reliability failures (eg, hang or crash), and resource overload-related …

被引用次数：18 相关文章所有 8 个版本

[PDF] osti.gov

[PDF][PDF] Enabling Advanced Operational Analysis Through Multi-subsystem Data Integration on Trinity.

JM Brandt, D DeBonis, AC Gentile, J Lujan, C Martin… - 2015 - osti.gov

Operations management of the ACES Trinity platform will rely on data from a variety of
sources including System Environment Data Collections (SEDC); node level information …

被引用次数：23 相关文章所有 6 个版本

[PDF] osti.gov

Design and implementation of a scalable hpc monitoring system

S Sanchez, A Bonnie, G Van Heule… - 2016 IEEE …, 2016 - ieeexplore.ieee.org

Over the past decade, platforms at Los Alamos National Laboratory (LANL) have
experienced large increases in complexity and scale to reach computational targets. The …

被引用次数：21 相关文章所有 3 个版本