A survey of online failure prediction methods

F Salfner, M Lenk, M Malek - ACM Computing Surveys (CSUR), 2010 - dl.acm.org
With the ever-growing complexity and dynamicity of computer systems, proactive fault
management is an effective approach to enhancing availability. Online failure prediction is …

Prefix: Switch failure prediction in datacenter networks

S Zhang, Y Liu, W Meng, Z Luo, J Bu, S Yang… - Proceedings of the …, 2018 - dl.acm.org
In modern datacenter networks (DCNs), failures of network devices are the norm rather than
the exception, and many research efforts have focused on dealing with failures after they …

Outage prediction and diagnosis for cloud service systems

Y Chen, X Yang, Q Lin, H Zhang, F Gao, Z Xu… - The world wide web …, 2019 - dl.acm.org
With the rapid growth of cloud service systems and their increasing complexity, service
failures become unavoidable. Outages, which are critical service failures, could dramatically …

Using hidden semi-Markov models for effective online failure prediction

F Salfner, M Malek - 2007 26th IEEE International Symposium …, 2007 - ieeexplore.ieee.org
A proactive handling of faults requires that the risk of upcoming failures is continuously
assessed. One of the promising approaches is online failure prediction, which means that …

A best practice guide to resource forecasting for computing systems

GA Hoffmann, KS Trivedi… - IEEE Transactions on …, 2007 - ieeexplore.ieee.org
Recently, measurement-based studies of software systems have proliferated, reflecting an
increasingly empirical focus on system availability, reliability, aging, and fault tolerance …

Photometric stereo with near point lighting: A solution by mesh deformation

W Xie, C Dai, CCL Wang - Proceedings of the IEEE …, 2015 - openaccess.thecvf.com
We tackle the problem of photometric stereo under near point lighting in this paper. Different
from the conventional formulation of photometric stereo that assumes parallel lighting …

Biglog: Unsupervised large-scale pre-training for a unified log representation

S Tao, Y Liu, W Meng, Z Ren, H Yang… - 2023 IEEE/ACM 31st …, 2023 - ieeexplore.ieee.org
Automated log analysis has been widely applied in modern data-center network, performing
critical tasks such as log parsing, log anomaly detection and log-based failure prediction …

Quantifying temporal and spatial correlation of failure events for proactive management

S Fu, CZ Xu - 2007 26th IEEE International Symposium on …, 2007 - ieeexplore.ieee.org
Networked computing systems continue to grow in scale and in the complexity of their
components and interactions. Component failures become norms instead of exceptions in …

Online failure prediction for railway transportation systems based on fuzzy rules and data analysis

Z Ding, Y Zhou, G Pu, MC Zhou - IEEE Transactions on …, 2018 - ieeexplore.ieee.org
Nowadays, software systems have been more and more complex, which causes great
challenges to maintain the availability of the systems. Online failure prediction provides an …

A practical approach for generating failure data for assessing and comparing failure prediction algorithms

I Irrera, M Vieira - 2014 IEEE 20th Pacific Rim International …, 2014 - ieeexplore.ieee.org
Failure Prediction allows improving the dependability of computer systems, but its use is still
uncommon due to scarcity of failure-related data that can be used for training, assessing and …