Anomaly detection and failure root cause analysis in (micro) service-based cloud applications: A survey
The proliferation of services and service interactions within microservices and cloud-native
applications, makes it harder to detect failures and to identify their possible root causes …
applications, makes it harder to detect failures and to identify their possible root causes …
Sage: practical and scalable ML-driven performance debugging in microservices
Cloud applications are increasingly shifting from large monolithic services to complex
graphs of loosely-coupled microservices. Despite the advantages of modularity and …
graphs of loosely-coupled microservices. Despite the advantages of modularity and …
Root cause analysis of failures in microservices through causal discovery
Most cloud applications use a large number of smaller sub-components (called
microservices) that interact with each other in the form of a complex graph to provide the …
microservices) that interact with each other in the form of a complex graph to provide the …
Microscope: Pinpoint performance issues with causal graphs in micro-service environments
Driven by the emerging business models (eg, digital sales) and IT technologies (eg, DevOps
and Cloud computing), the architecture of software is shifting from monolithic to microservice …
and Cloud computing), the architecture of software is shifting from monolithic to microservice …
Localizing failure root causes in a microservice through causality inference
An increasing number of Internet applications are applying microservice architecture due to
its flexibility and clear logic. The stability of microservice is thus vitally important for these …
its flexibility and clear logic. The stability of microservice is thus vitally important for these …
A spatiotemporal deep learning approach for unsupervised anomaly detection in cloud systems
Anomaly detection is a critical task for maintaining the performance of a cloud system. Using
data-driven methods to address this issue is the mainstream in recent years. However, due …
data-driven methods to address this issue is the mainstream in recent years. However, due …
Microrank: End-to-end latency issue localization with extended spectrum analysis in microservice environments
With the advantages of flexible scalability and fast delivery, microservice has become a
popular software architecture in the modern IT industry. However, the explosion in the …
popular software architecture in the modern IT industry. However, the explosion in the …
Groot: An event-graph-based approach for root cause analysis in industrial settings
For large-scale distributed systems, it is crucial to efficiently diagnose the root causes of
incidents to maintain high system availability. The recent development of microservice …
incidents to maintain high system availability. The recent development of microservice …
Microhecl: High-efficient root cause localization in large-scale microservice systems
Availability issues of industrial microservice systems (eg, drop of successfully placed orders
and processed transactions) directly affect the running of the business. These issues are …
and processed transactions) directly affect the running of the business. These issues are …
Failure diagnosis in microservice systems: A comprehensive survey and analysis
Modern microservice systems have gained widespread adoption due to their high
scalability, flexibility, and extensibility. However, the characteristics of independent …
scalability, flexibility, and extensibility. However, the characteristics of independent …