Why does the cloud stop computing? lessons from hundreds of service outages

HS Gunawi, M Hao, RO Suminto, A Laksono… - Proceedings of the …, 2016 - dl.acm.org
We conducted a cloud outage study (COS) of 32 popular Internet services. We analyzed
1247 headline news and public post-mortem reports that detail 597 unplanned outages that …

An empirical study on configuration errors in commercial and open source systems

Z Yin, X Ma, J Zheng, Y Zhou… - Proceedings of the …, 2011 - dl.acm.org
Configuration errors (ie, misconfigurations) are among the dominant causes of system
failures. Their importance has inspired many research efforts on detecting, diagnosing, and …

The attack of the clones: A study of the impact of shared code on vulnerability patching

A Nappa, R Johnson, L Bilge… - … IEEE symposium on …, 2015 - ieeexplore.ieee.org
Vulnerability exploits remain an important mechanism for malware delivery, despite efforts to
speed up the creation of patches and improvements in software updating mechanisms …

KATCH: High-coverage testing of software patches

PD Marinescu, C Cadar - Proceedings of the 2013 9th Joint Meeting on …, 2013 - dl.acm.org
One of the distinguishing characteristics of software systems is that they evolve: new patches
are committed to software repositories and new versions are released to users on a …

Cloud software upgrades: Challenges and opportunities

I Neamtiu, T Dumitraş - … on the Maintenance and Evolution of …, 2011 - ieeexplore.ieee.org
The fast evolution pace for cloud computing software is on a collision course with our
growing reliance on cloud computing. On one hand, cloud software must have the agility to …

Automatic error elimination by horizontal code transfer across multiple applications

S Sidiroglou-Douskos, E Lahtinen, F Long… - Proceedings of the 36th …, 2015 - dl.acm.org
We present Code Phage (CP), a system for automatically transferring correct code from
donor applications into recipient applications that process the same inputs to successfully …

[PDF][PDF] Inter-disciplinary research challenges in computer systems for the 2020s

A Cohen, X Shen, J Torrellas, J Tuck, Y Zhou, S Adve… - 2018 - research.csc.ncsu.edu
The broad landscape of new technologies currently being explored makes the current times
very exciting for computer systems research. The community is actively researching an …

Experience report: Anomaly detection of cloud application operations using log and cloud metric correlation analysis

M Farshchi, JG Schneider, I Weber… - 2015 IEEE 26th …, 2015 - ieeexplore.ieee.org
Failure of application operations is one of the main causes of system-wide outages in cloud
environments. This particularly applies to DevOps operations, such as backup …

Safe software updates via multi-version execution

P Hosek, C Cadar - 2013 35th International Conference on …, 2013 - ieeexplore.ieee.org
Software systems are constantly evolving, with new versions and patches being released on
a continuous basis. Unfortunately, software updates present a high risk, with many releases …

Keepers of the machines: Examining how system administrators manage software updates for multiple machines

F Li, L Rogers, A Mathur, N Malkin… - Fifteenth Symposium on …, 2019 - usenix.org
Keeping machines updated is crucial for maintaining system security. While recent studies
have investigated the software updating practices of end users, system administrators have …