Performance anomaly detection and bottleneck identification
O Ibidunmoye, F Hernández-Rodriguez… - ACM Computing Surveys …, 2015 - dl.acm.org
In order to meet stringent performance requirements, system administrators must effectively
detect undesirable performance behaviours, identify potential root causes, and take …
detect undesirable performance behaviours, identify potential root causes, and take …
Microrca: Root cause localization of performance issues in microservices
Software architecture is undergoing a transition from monolithic architectures to
microservices to achieve resilience, agility and scalability in software development …
microservices to achieve resilience, agility and scalability in software development …
A survey of aiops methods for failure management
Modern society is increasingly moving toward complex and distributed computing systems.
The increase in scale and complexity of these systems challenges O&M teams that perform …
The increase in scale and complexity of these systems challenges O&M teams that perform …
Microscope: Pinpoint performance issues with causal graphs in micro-service environments
Driven by the emerging business models (eg, digital sales) and IT technologies (eg, DevOps
and Cloud computing), the architecture of software is shifting from monolithic to microservice …
and Cloud computing), the architecture of software is shifting from monolithic to microservice …
Failure diagnosis in microservice systems: A comprehensive survey and analysis
Modern microservice systems have gained widespread adoption due to their high
scalability, flexibility, and extensibility. However, the characteristics of independent …
scalability, flexibility, and extensibility. However, the characteristics of independent …
Localizing faults in cloud systems
By leveraging large clusters of commodity hardware, the Cloud offers great opportunities to
optimize the operative costs of software systems, but impacts significantly on the reliability of …
optimize the operative costs of software systems, but impacts significantly on the reliability of …
Causeinfer: Automatic and distributed performance diagnosis with hierarchical causality graph in large distributed systems
Modern applications especially cloud-based or cloud-centric applications always have many
components running in the large distributed environment with complex interactions. They …
components running in the large distributed environment with complex interactions. They …
Augmenting simulated annealing to build interaction test suites
MB Cohen, CJ Colbourn… - … Symposium on Software …, 2003 - ieeexplore.ieee.org
Component based software development is prone to unexpected interaction faults. The goal
is to test as many-potential interactions as is feasible within time and budget constraints …
is to test as many-potential interactions as is feasible within time and budget constraints …
tprof: Performance profiling via structural aggregation and automated analysis of distributed systems traces
The traditional approach for performance debugging relies upon performance profilers (eg,
gprof, VTune) that provide average function runtime information. These aggregate statistics …
gprof, VTune) that provide average function runtime information. These aggregate statistics …
Mitigating interference in cloud services by middleware reconfiguration
Application performance has been and remains one of top five concerns since the inception
of cloud computing. A primary determinant of application performance is multi-tenancy or …
of cloud computing. A primary determinant of application performance is multi-tenancy or …