Event prediction in the big data era: A systematic survey

L Zhao - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
Events are occurrences in specific locations, time, and semantics that nontrivially impact
either our society or the nature, such as earthquakes, civil unrest, system failures …

A survey of online failure prediction methods

F Salfner, M Lenk, M Malek - ACM Computing Surveys (CSUR), 2010 - dl.acm.org
With the ever-growing complexity and dynamicity of computer systems, proactive fault
management is an effective approach to enhancing availability. Online failure prediction is …

A survey of aiops methods for failure management

P Notaro, J Cardoso, M Gerndt - ACM Transactions on Intelligent …, 2021 - dl.acm.org
Modern society is increasingly moving toward complex and distributed computing systems.
The increase in scale and complexity of these systems challenges O&M teams that perform …

Clustering event logs using iterative partitioning

AAO Makanju, AN Zincir-Heywood… - Proceedings of the 15th …, 2009 - dl.acm.org
The importance of event logs, as a source of information in systems and network
management cannot be overemphasized. With the ever increasing size and complexity of …

System log parsing: A survey

T Zhang, H Qiu, G Castellano, M Rifai… - … on Knowledge and …, 2023 - ieeexplore.ieee.org
Modern information and communication systems have become increasingly challenging to
manage. The ubiquitous system logs contain plentiful information and are thus widely …

Ai for it operations (aiops) on cloud platforms: Reviews, opportunities and challenges

Q Cheng, D Sahoo, A Saha, W Yang, C Liu… - arXiv preprint arXiv …, 2023 - arxiv.org
Artificial Intelligence for IT operations (AIOps) aims to combine the power of AI with the big
data generated by IT Operations processes, particularly in cloud infrastructures, to provide …

A systematic literature review on automated log abstraction techniques

D El-Masri, F Petrillo, YG Guéhéneuc… - Information and …, 2020 - Elsevier
Context: Logs are often the first and only information available to software engineers to
understand and debug their systems. Automated log-analysis techniques help software …

Prefix: Switch failure prediction in datacenter networks

S Zhang, Y Liu, W Meng, Z Luo, J Bu, S Yang… - Proceedings of the …, 2018 - dl.acm.org
In modern datacenter networks (DCNs), failures of network devices are the norm rather than
the exception, and many research efforts have focused on dealing with failures after they …

Survey on models and techniques for root-cause analysis

M Solé, V Muntés-Mulero, AI Rana… - arXiv preprint arXiv …, 2017 - arxiv.org
Automation and computer intelligence to support complex human decisions becomes
essential to manage large and distributed systems in the Cloud and IoT era. Understanding …

A lightweight algorithm for message type extraction in system application logs

A Makanju, AN Zincir-Heywood… - IEEE Transactions on …, 2011 - ieeexplore.ieee.org
Message type or message cluster extraction is an important task in the analysis of system
logs in computer networks. Defining these message types automatically facilitates the …