An ephemeral burst-buffer file system for scientific applications

T Wang, K Mohror, A Moody, K Sato… - SC'16: Proceedings of …, 2016 - ieeexplore.ieee.org
Burst buffers are becoming an indispensable hardware resource on large-scale
supercomputers to buffer the bursty I/O from scientific applications. However, there is a lack …

Exploration of lossy compression for application-level checkpoint/restart

N Sasaki, K Sato, T Endo… - 2015 IEEE international …, 2015 - ieeexplore.ieee.org
The scale of high performance computing (HPC) systems is exponentially growing,
potentially causing prohibitive shrinkage of mean time between failures (MTBF) while the …

Hermes: a heterogeneous-aware multi-tiered distributed I/O buffering system

A Kougkas, H Devarajan, XH Sun - Proceedings of the 27th International …, 2018 - dl.acm.org
Modern High-Performance Computing (HPC) systems are adding extra layers to the memory
and storage hierarchy named deep memory and storage hierarchy (DMSH), to increase I/O …

Scalable I/O-aware job scheduling for burst buffer enabled HPC clusters

S Herbein, DH Ahn, D Lipari, TRW Scogland… - Proceedings of the 25th …, 2016 - dl.acm.org
The economics of flash vs. disk storage is driving HPC centers to incorporate faster solid-
state burst buffers into the storage hierarchy in exchange for smaller parallel file system …

Efficient user-level storage disaggregation for deep learning

Y Zhu, W Yu, B Jiao, K Mohror, A Moody… - 2019 IEEE …, 2019 - ieeexplore.ieee.org
On large-scale high performance computing (HPC) systems, applications are provisioned
with aggregated resources to meet their peak demands for brief periods. This results in …

Managing I/O interference in a shared burst buffer system

S Thapaliya, P Bangalore, J Lofstead… - 2016 45th …, 2016 - ieeexplore.ieee.org
In this work, we investigate the problem of inter-application interference in a shared Burst
Buffer (BB) system. A BB is a new storage technology for HPC architectures that acts as an …

Hpc storage service autotuning using variational-autoencoder-guided asynchronous bayesian optimization

M Dorier, R Egele, P Balaprakash, J Koo… - 2022 IEEE …, 2022 - ieeexplore.ieee.org
Distributed data storage services tailored to specific applications have grown popular in the
high-performance computing (HPC) community as a way to address I/O and storage …

Survey of storage systems for high-performance computing

J Lüttgau, M Kuhn, K Duwe, Y Alforov… - Supercomputing …, 2018 - centaur.reading.ac.uk
In current supercomputers, storage is typically provided by parallel distributed file systems
for hot data and tape archives for cold data. These file systems are often compatible with …

Quantifying i/o and communication traffic interference on dragonfly networks equipped with burst buffers

M Mubarak, P Carns, J Jenkins, JK Li… - … on cluster computing …, 2017 - ieeexplore.ieee.org
HPC systems have shifted to burst buffer storage and high radix interconnect topologies in
order to meet the challenges of large-scale, data-intensive scientific computing. Both of …

Performance characterization of scientific workflows for the optimal use of burst buffers

CS Daley, D Ghoshal, GK Lockwood, S Dosanjh… - Future Generation …, 2020 - Elsevier
Scientific discoveries are increasingly dependent upon the analysis of large volumes of data
from observations and simulations of complex phenomena. Scientists compose the complex …