A survey on automated log analysis for reliability engineering
Logs are semi-structured text generated by logging statements in software source code. In
recent decades, software logs have become imperative in the reliability assurance …
recent decades, software logs have become imperative in the reliability assurance …
It infrastructure anomaly detection and failure handling: A systematic literature review focusing on datasets, log preprocessing, machine & deep learning approaches …
DA Bhanage, AV Pawar, K Kotecha - IEEE Access, 2021 - ieeexplore.ieee.org
Nowadays, reliability assurance is crucial in components of IT infrastructures. Unavailability
of any element or connection results in downtime and triggers monetary and performance …
of any element or connection results in downtime and triggers monetary and performance …
Semparser: A semantic parser for log analytics
Logs, being run-time information automatically generated by software, record system events
and activities with their timestamps. Before obtaining more insights into the run-time status of …
and activities with their timestamps. Before obtaining more insights into the run-time status of …
LogKG: Log Failure Diagnosis through Knowledge Graph
Y Sui, Y Zhang, J Sun, T Xu, S Zhang… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
Logs are one of the most valuable data to describe the running state of services. Failure
diagnosis through logs is crucial for service reliability and security. The current automatic log …
diagnosis through logs is crucial for service reliability and security. The current automatic log …
Quality evaluation of modern code reviews through intelligent biometric program comprehension
Code review is an essential practice in software engineering to spot code defects in the
early stages of software development. Modern code reviews (eg, acceptance or rejection of …
early stages of software development. Modern code reviews (eg, acceptance or rejection of …
Fail through the cracks: Cross-system interaction failures in modern cloud systems
Modern cloud systems are orchestrations of independent and interacting (sub-) systems,
each specializing in important services (eg, data processing, storage, resource …
each specializing in important services (eg, data processing, storage, resource …
Fault injection analytics: A novel approach to discover failure modes in cloud-computing systems
Cloud computing systems fail in complex and unexpected ways due to unexpected
combinations of events and interactions between hardware and software components. Fault …
combinations of events and interactions between hardware and software components. Fault …
Incident-aware duplicate ticket aggregation for cloud systems
In cloud systems, incidents are potential threats to customer satisfaction and business
revenue. When customers are affected by incidents, they often request customer support …
revenue. When customers are affected by incidents, they often request customer support …
An intelligent framework for timely, accurate, and comprehensive cloud incident detection
Cloud incidents (service interruptions or performance degradation) dramatically degrade the
reliability of large-scale cloud systems, causing customer dissatisfaction and revenue loss …
reliability of large-scale cloud systems, causing customer dissatisfaction and revenue loss …
Understanding and predicting incident mitigation time
Context: Incident management plays a significant role in online service systems. Incidents
should be mitigated as soon as possible in order to achieve high service stability. However …
should be mitigated as soon as possible in order to achieve high service stability. However …