Ai for it operations (aiops) on cloud platforms: Reviews, opportunities and challenges

Q Cheng, D Sahoo, A Saha, W Yang, C Liu… - arXiv preprint arXiv …, 2023 - arxiv.org
Artificial Intelligence for IT operations (AIOps) aims to combine the power of AI with the big
data generated by IT Operations processes, particularly in cloud infrastructures, to provide …

[HTML][HTML] Causalrca: Causal inference based precise fine-grained root cause localization for microservice applications

R Xin, P Chen, Z Zhao - Journal of Systems and Software, 2023 - Elsevier
Effectively localizing root causes of performance anomalies is crucial to enabling the rapid
recovery and loss mitigation of microservice applications in the cloud. Depending on the …

Revisiting VAE for Unsupervised Time Series Anomaly Detection: A Frequency Perspective

Z Wang, C Pei, M Ma, X Wang, Z Li, D Pei… - Proceedings of the …, 2024 - dl.acm.org
Time series Anomaly Detection (AD) plays a crucial role for web systems. Various web
systems rely on time series data to monitor and identify anomalies in real time, as well as to …

AutoKAD: Empowering KPI Anomaly Detection with Label-Free Deployment

Z Yu, C Pei, S Zhang, X Wen, J Li… - 2023 IEEE 34th …, 2023 - ieeexplore.ieee.org
Monitoring Key Performance Indicators (KPIs) and detecting anomalies in online service
systems is critical. However, choosing the right KPI anomaly detection algorithm and …

LWS: a framework for log-based workload simulation in session-based SUT

Y Han, Q Du, J Xu, S Zhao, Z Chen, L Cao, K Yin… - Journal of Systems and …, 2023 - Elsevier
Artificial intelligence for IT Operations (AIOps) plays a critical role in operating and managing
cloud-native systems and microservice-based applications but is limited by the lack of high …

Large Language Models can Deliver Accurate and Interpretable Time Series Anomaly Detection

J Liu, C Zhang, J Qian, M Ma, S Qin, C Bansal… - arXiv preprint arXiv …, 2024 - arxiv.org
Time series anomaly detection (TSAD) plays a crucial role in various industries by
identifying atypical patterns that deviate from standard trends, thereby maintaining system …

Learning to Diagnose: Meta-Learning for Efficient Adaptation in Few-Shot AIOps Scenarios

Y Duan, H Bao, G Bai, Y Wei, K Xue, Z You, Y Zhang… - Electronics, 2024 - mdpi.com
With the advancement of technologies like 5G, cloud computing, and microservices, the
complexity of network management systems and the variety of technical components have …

Indicator Fault Detection Method Based on Periodic Self Discovery and Historical Anomaly Filtering

S Wu, J Guan - IEEE Access, 2024 - ieeexplore.ieee.org
Data centers' information systems typically encompass a variety of operational objects
including applications, systems, networks, and devices, which generate a large volume of …

LEMMA-RCA: A Large Multi-modal Multi-domain Dataset for Root Cause Analysis

L Zheng, Z Chen, D Wang, C Deng, R Matsuoka… - arXiv preprint arXiv …, 2024 - arxiv.org
Root cause analysis (RCA) is crucial for enhancing the reliability and performance of
complex systems. However, progress in this field has been hindered by the lack of large …

[HTML][HTML] StreamAD: A cloud platform metrics-oriented benchmark for unsupervised online anomaly detection

J Xu, C Lin, F Liu, Y Wang, W Xiong, Z Li… - BenchCouncil …, 2023 - Elsevier
Cloud platforms, serving as fundamental infrastructure, play a significant role in developing
modern applications. In recent years, there has been growing interest among researchers in …