Groot: An event-graph-based approach for root cause analysis in industrial settings
For large-scale distributed systems, it is crucial to efficiently diagnose the root causes of
incidents to maintain high system availability. The recent development of microservice …
incidents to maintain high system availability. The recent development of microservice …
Big data analytics over encrypted datasets with seabed
Today, enterprises collect large amounts of data and leverage the cloud to perform analytics
over this data. Since the data is often sensitive, enterprises would prefer to keep it …
over this data. Since the data is often sensitive, enterprises would prefer to keep it …
{CFA}: A practical prediction system for video {QoE} optimization
Many prior efforts have suggested that Internet video Quality of Experience (QoE) could be
dramatically improved by using data-driven prediction of video quality for different choices …
dramatically improved by using data-driven prediction of video quality for different choices …
Hotspot: Anomaly localization for additive kpis with multi-dimensional attributes
Additive key performance indicators (KPIs)(such as page view (PV), revenue, and error
count) with multi-dimensional attributes (such as ISP, Province, and DataCenter) are …
count) with multi-dimensional attributes (such as ISP, Province, and DataCenter) are …
Fighting the fog of war: Automated incident detection for cloud systems
Incidents and outages dramatically degrade the availability of large-scale cloud computing
systems such as AWS, Azure, and GCP. In current incident response practice, each team …
systems such as AWS, Azure, and GCP. In current incident response practice, each team …
Logrule: Efficient structured log mining for root cause analysis
Accurate, timely Root Cause Analysis (RCA) is essential to successful IT operations as a
primary step to incident remediation. RCA automation using data mining techniques in large …
primary step to incident remediation. RCA automation using data mining techniques in large …
Constructing large-scale real-world benchmark datasets for aiops
Recently, AIOps (Artificial Intelligence for IT Operations) has been well studied in academia
and industry to enable automated and effective software service management. Plenty of …
and industry to enable automated and effective software service management. Plenty of …
Generic and robust localization of multi-dimensional root causes
Operators of online software services periodically collect various measures with many
attributes. When a measure becomes abnormal, indicating service problems such as …
attributes. When a measure becomes abnormal, indicating service problems such as …
Rapid and robust impact assessment of software changes in large internet-based services
The detection of performance changes in software change roll-outs in Internet-based
services is crucial for an operations team, because it allows timely roll-back of a software …
services is crucial for an operations team, because it allows timely roll-back of a software …
Funnel: Assessing software changes in web-based services
The detection of performance changes in software change roll-outs in Internet-based
services is crucial for an operations team, because it allows timely roll-back of a software …
services is crucial for an operations team, because it allows timely roll-back of a software …