Groot: An event-graph-based approach for root cause analysis in industrial settings

H Wang, Z Wu, H Jiang, Y Huang… - 2021 36th IEEE/ACM …, 2021 - ieeexplore.ieee.org
For large-scale distributed systems, it is crucial to efficiently diagnose the root causes of
incidents to maintain high system availability. The recent development of microservice …

Big data analytics over encrypted datasets with seabed

A Papadimitriou, R Bhagwan, N Chandran… - … USENIX symposium on …, 2016 - usenix.org
Today, enterprises collect large amounts of data and leverage the cloud to perform analytics
over this data. Since the data is often sensitive, enterprises would prefer to keep it …

{CFA}: A practical prediction system for video {QoE} optimization

J Jiang, V Sekar, H Milner, D Shepherd… - … USENIX Symposium on …, 2016 - usenix.org
Many prior efforts have suggested that Internet video Quality of Experience (QoE) could be
dramatically improved by using data-driven prediction of video quality for different choices …

Hotspot: Anomaly localization for additive kpis with multi-dimensional attributes

Y Sun, Y Zhao, Y Su, D Liu, X Nie, Y Meng… - IEEE …, 2018 - ieeexplore.ieee.org
Additive key performance indicators (KPIs)(such as page view (PV), revenue, and error
count) with multi-dimensional attributes (such as ISP, Province, and DataCenter) are …

Fighting the fog of war: Automated incident detection for cloud systems

L Li, X Zhang, X Zhao, H Zhang, Y Kang… - 2021 USENIX Annual …, 2021 - usenix.org
Incidents and outages dramatically degrade the availability of large-scale cloud computing
systems such as AWS, Azure, and GCP. In current incident response practice, each team …

Logrule: Efficient structured log mining for root cause analysis

P Notaro, S Haeri, J Cardoso… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Accurate, timely Root Cause Analysis (RCA) is essential to successful IT operations as a
primary step to incident remediation. RCA automation using data mining techniques in large …

Constructing large-scale real-world benchmark datasets for aiops

Z Li, N Zhao, S Zhang, Y Sun, P Chen, X Wen… - arXiv preprint arXiv …, 2022 - arxiv.org
Recently, AIOps (Artificial Intelligence for IT Operations) has been well studied in academia
and industry to enable automated and effective software service management. Plenty of …

Generic and robust localization of multi-dimensional root causes

Z Li, C Luo, Y Zhao, Y Sun, K Sui… - 2019 IEEE 30th …, 2019 - ieeexplore.ieee.org
Operators of online software services periodically collect various measures with many
attributes. When a measure becomes abnormal, indicating service problems such as …

Rapid and robust impact assessment of software changes in large internet-based services

S Zhang, Y Liu, D Pei, Y Chen, X Qu, S Tao… - Proceedings of the 11th …, 2015 - dl.acm.org
The detection of performance changes in software change roll-outs in Internet-based
services is crucial for an operations team, because it allows timely roll-back of a software …

Funnel: Assessing software changes in web-based services

S Zhang, Y Liu, D Pei, Y Chen, X Qu… - IEEE Transactions …, 2016 - ieeexplore.ieee.org
The detection of performance changes in software change roll-outs in Internet-based
services is crucial for an operations team, because it allows timely roll-back of a software …