{REPT}: Reverse debugging of failures in deployed software

W Cui, X Ge, B Kasikci, B Niu, U Sharma… - … USENIX Symposium on …, 2018 - usenix.org
Debugging software failures in deployed systems is important because they impact real
users and customers. However, debugging such failures is notoriously hard in practice …

Failure sketching: A technique for automated root cause diagnosis of in-production failures

B Kasikci, B Schubert, C Pereira, G Pokam… - Proceedings of the 25th …, 2015 - dl.acm.org
Developers spend a lot of time searching for the root causes of software failures. For this,
they traditionally try to reproduce those failures, but unfortunately many failures are so hard …

Execution reconstruction: Harnessing failure reoccurrences for failure reproduction

G Zuo, J Ma, A Quinn, P Bhatotia, P Fonseca… - Proceedings of the …, 2021 - dl.acm.org
Reproducing production failures is crucial for software reliability. Alas, existing bug
reproduction approaches are not suitable for production systems because they are not …

Selective mutation testing for concurrent code

M Gligoric, L Zhang, C Pereira, G Pokam - Proceedings of the 2013 …, 2013 - dl.acm.org
Concurrent code is becoming increasingly important with the advent of multi-cores, but
testing concurrent code is challenging. Researchers are developing new testing techniques …

Lazy diagnosis of in-production concurrency bugs

B Kasikci, W Cui, X Ge, B Niu - Proceedings of the 26th Symposium on …, 2017 - dl.acm.org
Diagnosing concurrency bugs---the process of understanding the root causes of
concurrency failures---is hard. Developers depend on reproducing concurrency bugs to …

Vidi: Record replay for reconfigurable hardware

G Zuo, J Ma, A Quinn, B Kasikci - Proceedings of the 28th ACM …, 2023 - dl.acm.org
Developers are turning to heterogeneous computing devices, such as Field Programmable
Gate Arrays (FPGAs), to accelerate data center and cloud computing workloads. FPGAs …

QuickRec: Prototyping an Intel architecture extension for record and replay of multithreaded programs

G Pokam, K Danne, C Pereira, R Kassa… - Proceedings of the 40th …, 2013 - dl.acm.org
There has been significant interest in hardware-assisted deterministic Record and Replay
(RnR) systems for multithreaded programs on multiprocessors. However, no proposal has …

Efficient Reproduction of Fault-Induced Failures in Distributed Systems with Feedback-Driven Fault Injection

J Pan, H Wu, T Leesatapornwongsa, S Nath… - Proceedings of the …, 2024 - dl.acm.org
Debugging a failure usually requires reproducing it first. This can be hard for failures in
production distributed systems, where bugs are exposed only by some unusual faulty …

Replay debugging: Leveraging record and replay for program debugging

N Honarmand, J Torrellas - ACM SIGARCH Computer Architecture News, 2014 - dl.acm.org
Hardware-assisted Record and Deterministic Replay (RnR) of programs has been proposed
as a primitive for debugging hard-to-repeat software bugs. However, simply providing …

Alligator in Vest: A Practical Failure-Diagnosis Framework via Arm Hardware Features

Y Zhang, Y Hu, H Li, W Shi, Z Ning, X Luo… - Proceedings of the 32nd …, 2023 - dl.acm.org
Failure diagnosis in practical systems is difficult, and the main obstacle is that the
information a developer has access to is limited. This information is usually not enough to …