{REPT}: Reverse debugging of failures in deployed software
Debugging software failures in deployed systems is important because they impact real
users and customers. However, debugging such failures is notoriously hard in practice …
users and customers. However, debugging such failures is notoriously hard in practice …
Failure sketching: A technique for automated root cause diagnosis of in-production failures
Developers spend a lot of time searching for the root causes of software failures. For this,
they traditionally try to reproduce those failures, but unfortunately many failures are so hard …
they traditionally try to reproduce those failures, but unfortunately many failures are so hard …
Execution reconstruction: Harnessing failure reoccurrences for failure reproduction
Reproducing production failures is crucial for software reliability. Alas, existing bug
reproduction approaches are not suitable for production systems because they are not …
reproduction approaches are not suitable for production systems because they are not …
Selective mutation testing for concurrent code
Concurrent code is becoming increasingly important with the advent of multi-cores, but
testing concurrent code is challenging. Researchers are developing new testing techniques …
testing concurrent code is challenging. Researchers are developing new testing techniques …
Lazy diagnosis of in-production concurrency bugs
Diagnosing concurrency bugs---the process of understanding the root causes of
concurrency failures---is hard. Developers depend on reproducing concurrency bugs to …
concurrency failures---is hard. Developers depend on reproducing concurrency bugs to …
Vidi: Record replay for reconfigurable hardware
Developers are turning to heterogeneous computing devices, such as Field Programmable
Gate Arrays (FPGAs), to accelerate data center and cloud computing workloads. FPGAs …
Gate Arrays (FPGAs), to accelerate data center and cloud computing workloads. FPGAs …
QuickRec: Prototyping an Intel architecture extension for record and replay of multithreaded programs
There has been significant interest in hardware-assisted deterministic Record and Replay
(RnR) systems for multithreaded programs on multiprocessors. However, no proposal has …
(RnR) systems for multithreaded programs on multiprocessors. However, no proposal has …
Efficient Reproduction of Fault-Induced Failures in Distributed Systems with Feedback-Driven Fault Injection
Debugging a failure usually requires reproducing it first. This can be hard for failures in
production distributed systems, where bugs are exposed only by some unusual faulty …
production distributed systems, where bugs are exposed only by some unusual faulty …
Replay debugging: Leveraging record and replay for program debugging
N Honarmand, J Torrellas - ACM SIGARCH Computer Architecture News, 2014 - dl.acm.org
Hardware-assisted Record and Deterministic Replay (RnR) of programs has been proposed
as a primitive for debugging hard-to-repeat software bugs. However, simply providing …
as a primitive for debugging hard-to-repeat software bugs. However, simply providing …
Alligator in Vest: A Practical Failure-Diagnosis Framework via Arm Hardware Features
Failure diagnosis in practical systems is difficult, and the main obstacle is that the
information a developer has access to is limited. This information is usually not enough to …
information a developer has access to is limited. This information is usually not enough to …