Event prediction in the big data era: A systematic survey
L Zhao - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
Events are occurrences in specific locations, time, and semantics that nontrivially impact
either our society or the nature, such as earthquakes, civil unrest, system failures …
either our society or the nature, such as earthquakes, civil unrest, system failures …
A survey of online failure prediction methods
F Salfner, M Lenk, M Malek - ACM Computing Surveys (CSUR), 2010 - dl.acm.org
With the ever-growing complexity and dynamicity of computer systems, proactive fault
management is an effective approach to enhancing availability. Online failure prediction is …
management is an effective approach to enhancing availability. Online failure prediction is …
A survey of aiops methods for failure management
Modern society is increasingly moving toward complex and distributed computing systems.
The increase in scale and complexity of these systems challenges O&M teams that perform …
The increase in scale and complexity of these systems challenges O&M teams that perform …
Clustering event logs using iterative partitioning
AAO Makanju, AN Zincir-Heywood… - Proceedings of the 15th …, 2009 - dl.acm.org
The importance of event logs, as a source of information in systems and network
management cannot be overemphasized. With the ever increasing size and complexity of …
management cannot be overemphasized. With the ever increasing size and complexity of …
System log parsing: A survey
Modern information and communication systems have become increasingly challenging to
manage. The ubiquitous system logs contain plentiful information and are thus widely …
manage. The ubiquitous system logs contain plentiful information and are thus widely …
Ai for it operations (aiops) on cloud platforms: Reviews, opportunities and challenges
Artificial Intelligence for IT operations (AIOps) aims to combine the power of AI with the big
data generated by IT Operations processes, particularly in cloud infrastructures, to provide …
data generated by IT Operations processes, particularly in cloud infrastructures, to provide …
A systematic literature review on automated log abstraction techniques
Context: Logs are often the first and only information available to software engineers to
understand and debug their systems. Automated log-analysis techniques help software …
understand and debug their systems. Automated log-analysis techniques help software …
Prefix: Switch failure prediction in datacenter networks
In modern datacenter networks (DCNs), failures of network devices are the norm rather than
the exception, and many research efforts have focused on dealing with failures after they …
the exception, and many research efforts have focused on dealing with failures after they …
Survey on models and techniques for root-cause analysis
Automation and computer intelligence to support complex human decisions becomes
essential to manage large and distributed systems in the Cloud and IoT era. Understanding …
essential to manage large and distributed systems in the Cloud and IoT era. Understanding …
A lightweight algorithm for message type extraction in system application logs
A Makanju, AN Zincir-Heywood… - IEEE Transactions on …, 2011 - ieeexplore.ieee.org
Message type or message cluster extraction is an important task in the analysis of system
logs in computer networks. Defining these message types automatically facilitates the …
logs in computer networks. Defining these message types automatically facilitates the …