Adaptive resource efficient microservice deployment in cloud-edge continuum

K Fu, W Zhang, Q Chen, D Zeng… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
User-facing services are now evolving towards the microservice architecture where a
service is built by connecting multiple microservice stages. Since the entire service is heavy …

Root cause analysis of failures in microservices through causal discovery

A Ikram, S Chakraborty, S Mitra… - Advances in …, 2022 - proceedings.neurips.cc
Most cloud applications use a large number of smaller sub-components (called
microservices) that interact with each other in the form of a complex graph to provide the …

Eadro: An end-to-end troubleshooting framework for microservices on multi-source data

C Lee, T Yang, Z Chen, Y Su… - 2023 IEEE/ACM 45th …, 2023 - ieeexplore.ieee.org
The complexity and dynamism of microservices pose significant challenges to system
reliability, and thereby, automated troubleshooting is crucial. Effective root cause localization …

{CRISP}: Critical path analysis of {Large-Scale} microservice architectures

Z Zhang, MK Ramanathan, P Raj, A Parwal… - 2022 USENIX Annual …, 2022 - usenix.org
Microservice architectures have become the lifeblood of modern service-oriented software
systems. Remote Procedure Calls (RPCs) among microservices are deeply nested …

Designing cloud servers for lower carbon

J Wang, DS Berger, F Kazhamiaka… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org
To mitigate climate change, we must reduce carbon emissions from hyperscale cloud
computing. We find that cloud compute servers cause the majority of emissions in a general …

The power of prediction: microservice auto scaling via workload learning

S Luo, H Xu, K Ye, G Xu, L Zhang, G Yang… - Proceedings of the 13th …, 2022 - dl.acm.org
When deploying microservices in production clusters, it is critical to automatically scale
containers to improve cluster utilization and ensure service level agreements (SLA) …

Causal inference-based root cause analysis for online service systems with intervention recognition

M Li, Z Li, K Yin, X Nie, W Zhang, K Sui… - Proceedings of the 28th …, 2022 - dl.acm.org
Fault diagnosis is critical in many domains, as faults may lead to safety threats or economic
losses. In the field of online service systems, operators rely on enormous monitoring data to …

Lifting the veil on {Meta's} microservice architecture: Analyses of topology and request workflows

D Huye, Y Shkuro, RR Sambasivan - 2023 USENIX Annual Technical …, 2023 - usenix.org
The microservice architecture is a novel paradigm for building and operating distributed
applications in many organizations. This paradigm changes many aspects of how distributed …

Actionable and interpretable fault localization for recurring failures in online service systems

Z Li, N Zhao, M Li, X Lu, L Wang, D Chang… - Proceedings of the 30th …, 2022 - dl.acm.org
Fault localization is challenging in an online service system due to its monitoring data's large
volume and variety and complex dependencies across/within its components (eg, services …

Aquatope: Qos-and-uncertainty-aware resource management for multi-stage serverless workflows

Z Zhou, Y Zhang, C Delimitrou - Proceedings of the 28th ACM …, 2022 - dl.acm.org
Multi-stage serverless applications, ie, workflows with many computation and I/O stages, are
becoming increasingly representative of FaaS platforms. Despite their advantages in terms …