A survey of fault tolerance mechanisms and checkpoint/restart implementations for high performance computing systems

IP Egwutuoha, D Levy, B Selic, S Chen - The Journal of Supercomputing, 2013 - Springer
Abstract In recent years, High Performance Computing (HPC) systems have been shifting
from expensive massively parallel architectures to clusters of commodity PCs to take …

A roadmap toward the resilient internet of things for cyber-physical systems

D Ratasich, F Khalid, F Geissler, R Grosu… - IEEE …, 2019 - ieeexplore.ieee.org
The Internet of Things (IoT) is a ubiquitous system connecting many different devices-the
things-which can be accessed from the distance. The cyber-physical systems (CPSs) …

Methods, media and systems for responding to a denial of service attack

A Stavrou, AD Keromytis, J Nieh, V Misra… - US Patent …, 2013 - Google Patents
Methods, media and systems for responding to a Denial of Service (DoS) attack are
provided. In some embodiments, a method includes detecting a DoS attack, migrating one or …

[PDF][PDF] Transparent Checkpoint-Restart of Multiple Processes on Commodity Operating Systems.

O Laadan, J Nieh - USENIX Annual Technical Conference, 2007 - usenix.org
The ability to checkpoint a running application and restart it later can provide many useful
benefits including fault recovery, advanced resources sharing, dynamic load balancing and …

Systems, methods, means, and media for recording, searching, and outputting display information

R Baratto, O Laadan, D Phung, SJ Potter… - US Patent …, 2012 - Google Patents
A portion of the disclosure of this patent document con tains material which is Subject to
copyright protection. The copyright owner has no objection to the facsimile reproduc tion by …

Fault tolerance to balance for messaging layers in communication society

A Mikhail, HH Kareem… - … International conference on …, 2017 - ieeexplore.ieee.org
The present communication societies are based on use of High-Performance Computing
(HPC) systems for balancing the messaging layers. However the HPC systems are …

Methods, media and systems for managing a distributed application running in a plurality of digital processing devices

O Laadan, J Nieh, D Phung - US Patent 8,280,944, 2012 - Google Patents
Methods, media and systems for managing a distributed application running in a plurality of
digital processing devices are provided. In some embodiments, a method includes run ning …

[PDF][PDF] Linux-CR: Transparent application checkpoint-restart in Linux

O Laadan, SE Hallyn - Linux Symposium, 2010 - Citeseer
Application checkpoint-restart is the ability to save the state of a running application so that it
can later resume its execution from the time of the checkpoint. Application checkpoint-restart …

Flux: Multi-surface computing in Android

A Van't Hof, H Jamjoom, J Nieh… - Proceedings of the Tenth …, 2015 - dl.acm.org
With the continued proliferation of mobile devices, apps will increasingly become multi-
surface, running seamlessly across multiple user devices (eg, phone, tablet, etc.). Yet …

Lightweight memory checkpointing

D Vogt, C Giuffrida, H Bos… - 2015 45th Annual IEEE …, 2015 - ieeexplore.ieee.org
Memory check pointing is a pivotal technique in systems reliability, with applications ranging
from crash recovery to replay debugging. Unfortunately, many traditional memory check …