The landscape of exascale research: A data-driven literature analysis

S Heldens, P Hijma, BV Werkhoven… - ACM Computing …, 2020 - dl.acm.org
The next generation of supercomputers will break the exascale barrier. Soon we will have
systems capable of at least one quintillion (billion billion) floating-point operations per …

A study on data deduplication in HPC storage systems

D Meister, J Kaiser, A Brinkmann… - SC'12: Proceedings …, 2012 - ieeexplore.ieee.org
Deduplication is a storage saving technique that is highly successful in enterprise backup
environments. On a file system, a single data block might be stored multiple times across …

The SIMNET virtual world architecture

J Calvin, A Dickens, B Gaines… - Proceedings of IEEE …, 1993 - ieeexplore.ieee.org
Many tools and techniques have been developed to address specific aspects of interacting
in a virtual world. Few have been designed with an architecture that allows large numbers of …

Exploring automatic, online failure recovery for scientific applications at extreme scales

M Gamell, DS Katz, H Kolla, J Chen… - SC'14: Proceedings …, 2014 - ieeexplore.ieee.org
Application resilience is a key challenge that must be addressed in order to realize the
exascale vision. Process/node failures, an important class of failures, are typically handled …

Exploration of lossy compression for application-level checkpoint/restart

N Sasaki, K Sato, T Endo… - 2015 IEEE international …, 2015 - ieeexplore.ieee.org
The scale of high performance computing (HPC) systems is exponentially growing,
potentially causing prohibitive shrinkage of mean time between failures (MTBF) while the …

Data compression for the exascale computing era-survey

SW Son, Z Chen, W Hendrix, A Agrawal… - Supercomputing …, 2014 - superfri.susu.ru
While periodic checkpointing has been an important mechanism for tolerating faults in high-
performance computing (HPC) systems, it is cost-prohibitive as the HPC system approaches …

Ultrafast error-bounded lossy compression for scientific datasets

X Yu, S Di, K Zhao, J Tian, D Tao, X Liang… - Proceedings of the 31st …, 2022 - dl.acm.org
Today's scientific high-performance computing applications and advanced instruments are
producing vast volumes of data across a wide range of domains, which impose a serious …

Exploring the feasibility of lossy compression for pde simulations

J Calhoun, F Cappello, LN Olson… - … Journal of High …, 2019 - journals.sagepub.com
Checkpoint restart plays an important role in high-performance computing (HPC)
applications, allowing simulation runtime to extend beyond a single job allocation and …

A user-level infiniband-based file system and checkpoint strategy for burst buffers

K Sato, K Mohror, A Moody, T Gamblin… - 2014 14th IEEE/ACM …, 2014 - ieeexplore.ieee.org
Checkpoint/Restart is an indispensable fault tolerance technique commonly used by high-
performance computing applications that run continuously for hours or days at a time …

Efficient encoding and reconstruction of HPC datasets for checkpoint/restart

J Zhang, X Zhuo, A Moon, H Liu… - 2019 35th Symposium …, 2019 - ieeexplore.ieee.org
As the amount of data produced by HPC applications reaches the exabyte range,
compression techniques are often adopted to reduce the checkpoint time and volume. Since …