[HTML][HTML] Fault tolerance in big data storage and processing systems: A review on challenges and solutions
Big data systems are sufficiently stable to store and process a massive volume of rapidly
changing data. However, big data systems are composed of large-scale hardware resources …
changing data. However, big data systems are composed of large-scale hardware resources …
[HTML][HTML] Machine learning job failure analysis and prediction model for the cloud environment
Reliable and accessible cloud applications are essential for the future of ubiquitous
computing, smart appliances, and electronic health. Owing to the vastness and diversity of …
computing, smart appliances, and electronic health. Owing to the vastness and diversity of …
Cloud failure prediction based on traditional machine learning and deep learning
Cloud failure is one of the critical issues since it can cost millions of dollars to cloud service
providers, in addition to the loss of productivity suffered by industrial users. Fault tolerance …
providers, in addition to the loss of productivity suffered by industrial users. Fault tolerance …
Predicting faults in high performance computing systems: An in-depth survey of the state-of-the-practice
As we near exascale, resilience remains a major technical hurdle. Any technique with the
goal of achieving resilience suffers from having to be reactive, as failures can appear at any …
goal of achieving resilience suffers from having to be reactive, as failures can appear at any …
Analysis of job failure and prediction model for cloud computing using machine learning
MS Jassas, QH Mahmoud - Sensors, 2022 - mdpi.com
Modern applications, such as smart cities, home automation, and eHealth, demand a new
approach to improve cloud application dependability and availability. Due to the enormous …
approach to improve cloud application dependability and availability. Due to the enormous …
A dynamic and failure-aware task scheduling framework for hadoop
Hadoop has become a popular framework for processing data-intensive applications in
cloud environments. A core constituent of Hadoop is the scheduler, which is responsible for …
cloud environments. A core constituent of Hadoop is the scheduler, which is responsible for …
Proactive failure-aware task scheduling framework for cloud computing
Y Alahmad, T Daradkeh, A Agarwal - IEEE Access, 2021 - ieeexplore.ieee.org
Cloud computing is a widely adopted platform for executing tasks of different application
types that belong to the end users. In the cloud, application task is prone to failure for several …
types that belong to the end users. In the cloud, application task is prone to failure for several …
Time machine: generative real-time model for failure (and lead time) prediction in hpc systems
High Performance Computing (HPC) systems generate a large amount of unstructured/
alphanumeric log messages that capture the health state of their components. Due to their …
alphanumeric log messages that capture the health state of their components. Due to their …
Task Failure Prediction Using Machine Learning Techniques in the Google Cluster Trace Cloud Computing Environment.
M Gollapalli, MA AlMetrik… - Mathematical …, 2022 - search.ebscohost.com
Cloud computing has grown into a critical technology by enabling ground-breaking
capabilities for Internet-dependent computer platforms and software applications. As cloud …
capabilities for Internet-dependent computer platforms and software applications. As cloud …
Analyzing the impact of various parameters on job scheduling in the Google cluster dataset
D Shahmirzadi, N Khaledian, AM Rahmani - Cluster Computing, 2024 - Springer
Cloud architecture and its operations interest both general consumers and researchers.
Google, as a technology giant, offers cloud services globally. This paper analyzes the …
Google, as a technology giant, offers cloud services globally. This paper analyzes the …