Nezha: Interpretable fine-grained root causes analysis for microservices on multi-modal observability data

G Yu, P Chen, Y Li, H Chen, X Li, Z Zheng - Proceedings of the 31st …, 2023 - dl.acm.org
Root cause analysis (RCA) in large-scale microservice systems is a critical and challenging
task. To understand and localize root causes of unexpected faults, modern observability …

Failure Diagnosis in Microservice Systems: A Comprehensive Survey and Analysis

S Zhang, S Xia, W Fan, B Shi, X Xiong, Z Zhong… - arXiv preprint arXiv …, 2024 - arxiv.org
Modern microservice systems have gained widespread adoption due to their high
scalability, flexibility, and extensibility. However, the characteristics of independent …

Predictive monitoring against pattern regular languages

Z Ang, U Mathur - Proceedings of the ACM on Programming Languages, 2024 - dl.acm.org
While current bug detection techniques for concurrent software focus on unearthing low-
level issues such as data races or deadlocks, they often fall short of discovering more …

Logrule: Efficient structured log mining for root cause analysis

P Notaro, S Haeri, J Cardoso… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Accurate, timely Root Cause Analysis (RCA) is essential to successful IT operations as a
primary step to incident remediation. RCA automation using data mining techniques in large …

Explaining mispredictions of machine learning models using rule induction

J Cito, I Dillig, S Kim, V Murali, S Chandra - … of the 29th ACM joint meeting …, 2021 - dl.acm.org
While machine learning (ML) models play an increasingly prevalent role in many software
engineering tasks, their prediction accuracy is often problematic. When these models do …

Conan: Diagnosing batch failures for cloud systems

L Li, X Zhang, S He, Y Kang, H Zhang… - 2023 IEEE/ACM 45th …, 2023 - ieeexplore.ieee.org
Failure diagnosis is critical to the maintenance of large-scale cloud systems, which has
attracted tremendous attention from academia and industry over the last decade. In this …

What is an app store? The software engineering perspective

W Zhu, S Proksch, DM German, MW Godfrey… - Empirical Software …, 2024 - Springer
Abstract “App stores” are online software stores where end users may browse, purchase,
download, and install software applications. By far, the best known app stores are …

[PDF][PDF] Expert perspectives on explainability

J Cito, S Chandra, C Tantithamthavorn… - IEEE …, 2023 - research.monash.edu
MAY/JUNE 2023| IEEE SOFTWARE 85 for changes, and these sorts of signals. We are also
looking to capture discussions that happen around code in internal discussion forums (a bit …

Root cause analysis of anomalies based on graph convolutional neural network

Z Li, Y Tu, Z Ma - International Journal of Software Engineering and …, 2022 - World Scientific
With the gradual increase of network complexity and network scale in the cloud
environment, Root Cause Analysis (RCA) of node failures has become a systematic problem …

Trace-based Multi-Dimensional Root Cause Localization of Performance Issues in Microservice Systems

C Zhang, Z Dong, X Peng, B Zhang… - Proceedings of the IEEE …, 2024 - dl.acm.org
Modern microservice systems have become increasingly complicated due to the dynamic
and complex interactions and runtime environment. It leads to the system vulnerable to …