A survey of online failure prediction methods
F Salfner, M Lenk, M Malek - ACM Computing Surveys (CSUR), 2010 - dl.acm.org
With the ever-growing complexity and dynamicity of computer systems, proactive fault
management is an effective approach to enhancing availability. Online failure prediction is …
management is an effective approach to enhancing availability. Online failure prediction is …
Prefix: Switch failure prediction in datacenter networks
In modern datacenter networks (DCNs), failures of network devices are the norm rather than
the exception, and many research efforts have focused on dealing with failures after they …
the exception, and many research efforts have focused on dealing with failures after they …
Outage prediction and diagnosis for cloud service systems
With the rapid growth of cloud service systems and their increasing complexity, service
failures become unavoidable. Outages, which are critical service failures, could dramatically …
failures become unavoidable. Outages, which are critical service failures, could dramatically …
Using hidden semi-Markov models for effective online failure prediction
F Salfner, M Malek - 2007 26th IEEE International Symposium …, 2007 - ieeexplore.ieee.org
A proactive handling of faults requires that the risk of upcoming failures is continuously
assessed. One of the promising approaches is online failure prediction, which means that …
assessed. One of the promising approaches is online failure prediction, which means that …
A best practice guide to resource forecasting for computing systems
GA Hoffmann, KS Trivedi… - IEEE Transactions on …, 2007 - ieeexplore.ieee.org
Recently, measurement-based studies of software systems have proliferated, reflecting an
increasingly empirical focus on system availability, reliability, aging, and fault tolerance …
increasingly empirical focus on system availability, reliability, aging, and fault tolerance …
Photometric stereo with near point lighting: A solution by mesh deformation
We tackle the problem of photometric stereo under near point lighting in this paper. Different
from the conventional formulation of photometric stereo that assumes parallel lighting …
from the conventional formulation of photometric stereo that assumes parallel lighting …
Biglog: Unsupervised large-scale pre-training for a unified log representation
Automated log analysis has been widely applied in modern data-center network, performing
critical tasks such as log parsing, log anomaly detection and log-based failure prediction …
critical tasks such as log parsing, log anomaly detection and log-based failure prediction …
Quantifying temporal and spatial correlation of failure events for proactive management
Networked computing systems continue to grow in scale and in the complexity of their
components and interactions. Component failures become norms instead of exceptions in …
components and interactions. Component failures become norms instead of exceptions in …
Online failure prediction for railway transportation systems based on fuzzy rules and data analysis
Nowadays, software systems have been more and more complex, which causes great
challenges to maintain the availability of the systems. Online failure prediction provides an …
challenges to maintain the availability of the systems. Online failure prediction provides an …
A practical approach for generating failure data for assessing and comparing failure prediction algorithms
Failure Prediction allows improving the dependability of computer systems, but its use is still
uncommon due to scarcity of failure-related data that can be used for training, assessing and …
uncommon due to scarcity of failure-related data that can be used for training, assessing and …