I/o access patterns in hpc applications: A 360-degree survey

JL Bez, S Byna, S Ibrahim - ACM Computing Surveys, 2023 - dl.acm.org
The high-performance computing I/O stack has been complex due to multiple software
layers, the inter-dependencies among these layers, and the different performance tuning …

Artificial intelligence for biology

S Hassoun, F Jefferson, X Shi, B Stucky… - Integrative and …, 2021 - academic.oup.com
Despite efforts to integrate research across different subdisciplines of biology, the scale of
integration remains limited. We hypothesize that future generations of Artificial Intelligence …

Systematically inferring I/O performance variability by examining repetitive job behavior

E Costa, T Patel, B Schwaller, JM Brandt… - Proceedings of the …, 2021 - dl.acm.org
Monitoring and analyzing I/O behaviors is critical to the efficient utilization of parallel storage
systems. Unfortunately, with increasing I/O requirements and resource contention, I/O …

Capturing periodic I/O using frequency techniques

A Tarraf, A Bandet, F Boito, G Pallez… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
Many HPC applications perform their I/O in bursts that follow a periodic pattern. This allows
for making predictions as to when a burst occurs. System providers can take advantage of …

Understanding The Effectiveness of Lossy Compression in Machine Learning Training Sets

R Underwood, JC Calhoun, S Di, F Cappello - arXiv preprint arXiv …, 2024 - arxiv.org
Learning and Artificial Intelligence (ML/AI) techniques have become increasingly prevalent
in high performance computing (HPC). However, these methods depend on vast volumes of …

High-Quality I/O Bandwidth Prediction with Minimal Data via Transfer Learning Workflow

D Povaliaiev, R Liem, J Kunkel… - 2024 IEEE 36th …, 2024 - ieeexplore.ieee.org
Providing a high-quality performance prediction has the potential to enhance various
aspects of a cluster, such as devising scheduling and provisioning policies, guiding …

Graph3PO: A Temporal Graph Data Processing Method for Latency QoS Guarantee in Object Cloud Storage System

W Zhang, Z Shi, Z Liao, Y Li, Y Du, Y Wu… - Proceedings of the …, 2023 - dl.acm.org
Object cloud storage systems are deployed with diverse applications that have varying
latency service level objectives (SLOs), posting challenges for supporting quality of service …

Gauge: An interactive data-driven visualization tool for HPC application I/O performance analysis

E Del Rosario, M Currier, M Isakov… - 2020 IEEE/ACM Fifth …, 2020 - ieeexplore.ieee.org
Understanding and alleviating I/O bottlenecks in HPC system workloads is difficult due to the
complex, multilayered nature of HPC I/O subsystems. Even with full visibility into the jobs …

[HTML][HTML] Development of an equation-based parallelization method for multiphase particle-in-cell simulation s

M Woo, T Jordan, T Nandi, JF Dietiker… - Engineering with …, 2023 - Springer
Manufacturers have been developing new graphics processing unit (GPU) nodes with large
capacity, high bandwidth memory and very high bandwidth intra-node interconnects. This …

Interpretable Analysis of Production GPU Clusters Monitoring Data via Association Rule Mining

B Li, S Samsi, V Gadepally… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
Modern high-performance computing (HPC) and cloud computing systems are integrating
powerful GPUs to accelerate increasingly demanding deep learning workloads. To improve …