Deep learning for anomaly detection in time-series data: Review, analysis, and guidelines

K Choi, J Yi, C Park, S Yoon - IEEE access, 2021 - ieeexplore.ieee.org
As industries become automated and connectivity technologies advance, a wide range of
systems continues to generate massive amounts of data. Many approaches have been …

[HTML][HTML] A survey on data-driven predictive maintenance for the railway industry

N Davari, B Veloso, GA Costa, PM Pereira, RP Ribeiro… - Sensors, 2021 - mdpi.com
In the last few years, many works have addressed Predictive Maintenance (PdM) by the use
of Machine Learning (ML) and Deep Learning (DL) solutions, especially the latter. The …

Towards intelligent incident management: why we need it and how we make it

Z Chen, Y Kang, L Li, X Zhang, H Zhang, H Xu… - Proceedings of the 28th …, 2020 - dl.acm.org
The management of cloud service incidents (unplanned interruptions or outages of a
service/product) greatly affects customer satisfaction and business revenue. After years of …

Logtransfer: Cross-system log anomaly detection for software systems with transfer learning

R Chen, S Zhang, D Li, Y Zhang, F Guo… - 2020 IEEE 31st …, 2020 - ieeexplore.ieee.org
System logs, which describe a variety of events of software systems, are becoming
increasingly popular for anomaly detection. However, for a large software system, current …

{Jump-Starting} multivariate time series anomaly detection for online service systems

M Ma, S Zhang, J Chen, J Xu, H Li, Y Lin… - 2021 USENIX Annual …, 2021 - usenix.org
With the booming of online service systems, anomaly detection on multivariate time series,
such as a combination of CPU utilization, average response time, and requests per second …

Ai for it operations (aiops) on cloud platforms: Reviews, opportunities and challenges

Q Cheng, D Sahoo, A Saha, W Yang, C Liu… - arXiv preprint arXiv …, 2023 - arxiv.org
Artificial Intelligence for IT operations (AIOps) aims to combine the power of AI with the big
data generated by IT Operations processes, particularly in cloud infrastructures, to provide …

Efficient kpi anomaly detection through transfer learning for large-scale web services

S Zhang, Z Zhong, D Li, Q Fan, Y Sun… - IEEE Journal on …, 2022 - ieeexplore.ieee.org
Timely anomaly detection of key performance indicators (KPIs), eg, service response time,
error rate, is of utmost importance to Web services. Over the years, many unsupervised deep …

Logclass: Anomalous log identification and classification with partial labels

W Meng, Y Liu, S Zhang, F Zaiter… - … on Network and …, 2021 - ieeexplore.ieee.org
Logs are imperative in the management process of networks and services. However,
manually identifying and classifying anomalous logs is time-consuming, error-prone, and …

tprof: Performance profiling via structural aggregation and automated analysis of distributed systems traces

L Huang, T Zhu - Proceedings of the ACM Symposium on Cloud …, 2021 - dl.acm.org
The traditional approach for performance debugging relies upon performance profilers (eg,
gprof, VTune) that provide average function runtime information. These aggregate statistics …

Rlad: Time series anomaly detection through reinforcement learning and active learning

T Wu, J Ortiz - arXiv preprint arXiv:2104.00543, 2021 - arxiv.org
We introduce a new semi-supervised, time series anomaly detection algorithm that uses
deep reinforcement learning (DRL) and active learning to efficiently learn and adapt to …