Performance anomaly detection and bottleneck identification

O Ibidunmoye, F Hernández-Rodriguez… - ACM Computing Surveys …, 2015 - dl.acm.org
In order to meet stringent performance requirements, system administrators must effectively
detect undesirable performance behaviours, identify potential root causes, and take …

Microrca: Root cause localization of performance issues in microservices

L Wu, J Tordsson, E Elmroth… - NOMS 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org
Software architecture is undergoing a transition from monolithic architectures to
microservices to achieve resilience, agility and scalability in software development …

A survey of aiops methods for failure management

P Notaro, J Cardoso, M Gerndt - ACM Transactions on Intelligent …, 2021 - dl.acm.org
Modern society is increasingly moving toward complex and distributed computing systems.
The increase in scale and complexity of these systems challenges O&M teams that perform …

Microscope: Pinpoint performance issues with causal graphs in micro-service environments

JJ Lin, P Chen, Z Zheng - … , ICSOC 2018, Hangzhou, China, November 12 …, 2018 - Springer
Driven by the emerging business models (eg, digital sales) and IT technologies (eg, DevOps
and Cloud computing), the architecture of software is shifting from monolithic to microservice …

Failure diagnosis in microservice systems: A comprehensive survey and analysis

S Zhang, S Xia, W Fan, B Shi, X Xiong, Z Zhong… - arXiv preprint arXiv …, 2024 - arxiv.org
Modern microservice systems have gained widespread adoption due to their high
scalability, flexibility, and extensibility. However, the characteristics of independent …

Localizing faults in cloud systems

L Mariani, C Monni, M Pezzé… - 2018 IEEE 11th …, 2018 - ieeexplore.ieee.org
By leveraging large clusters of commodity hardware, the Cloud offers great opportunities to
optimize the operative costs of software systems, but impacts significantly on the reliability of …

Causeinfer: Automatic and distributed performance diagnosis with hierarchical causality graph in large distributed systems

P Chen, Y Qi, P Zheng, D Hou - IEEE INFOCOM 2014-IEEE …, 2014 - ieeexplore.ieee.org
Modern applications especially cloud-based or cloud-centric applications always have many
components running in the large distributed environment with complex interactions. They …

Augmenting simulated annealing to build interaction test suites

MB Cohen, CJ Colbourn… - … Symposium on Software …, 2003 - ieeexplore.ieee.org
Component based software development is prone to unexpected interaction faults. The goal
is to test as many-potential interactions as is feasible within time and budget constraints …

tprof: Performance profiling via structural aggregation and automated analysis of distributed systems traces

L Huang, T Zhu - Proceedings of the ACM Symposium on Cloud …, 2021 - dl.acm.org
The traditional approach for performance debugging relies upon performance profilers (eg,
gprof, VTune) that provide average function runtime information. These aggregate statistics …

Mitigating interference in cloud services by middleware reconfiguration

AK Maji, S Mitra, B Zhou, S Bagchi… - Proceedings of the 15th …, 2014 - dl.acm.org
Application performance has been and remains one of top five concerns since the inception
of cloud computing. A primary determinant of application performance is multi-tenancy or …