A survey on malleability solutions for high-performance distributed computing

JI Aliaga, M Castillo, S Iserte, I Martín-Álvarez… - Applied Sciences, 2022 - mdpi.com
Maintaining a high rate of productivity, in terms of completed jobs per unit of time, in High-
Performance Computing (HPC) facilities is a cornerstone in the next generation of exascale …

ElastiSim: a batch-system simulator for malleable workloads

T Özden, T Beringer, A Mazaheri, HM Fard… - Proceedings of the 51st …, 2022 - dl.acm.org
As high-performance computing infrastructures move towards exascale, the role of resource
and job management systems is more critical now than ever. Simulating batch systems to …

Hybrid workload scheduling on HPC systems

Y Fan, Z Lan, P Rich, W Allcock… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org
Traditionally, on-demand, rigid, and malleable applications have been scheduled and
executed on separate systems. The ever-growing workload demands and rapidly …

[PDF][PDF] Multi-Agent Deep Reinforcement Learning-Based Resource Allocation in HPC/AI Converged Cluster.

J Narantuya, JS Shin, S Park… - Computers, Materials & …, 2022 - cdn.techscience.cn
As the complexity of deep learning (DL) networks and training data grows enormously,
methods that scale with computation are becoming the future of artificial intelligence (AI) …

Transparent resource elasticity for task-based cluster environments with work stealing

J Posner, C Fohry - 50th International Conference on Parallel …, 2021 - dl.acm.org
Resource elasticity allows to dynamically change the resources of running jobs, which may
significantly improve the throughput on supercomputers. Elasticity requires support from …

Software Resource Disaggregation for HPC with Serverless Computing

M Copik, M Chrapek, L Schmid, A Calotoiu… - arXiv preprint arXiv …, 2024 - arxiv.org
Aggregated HPC resources have rigid allocation systems and programming models which
struggle to adapt to diverse and changing workloads. Consequently, HPC systems fail to …

Adaptive elasticity policies for staging-based in situ visualization

Z Wang, M Dorier, P Subedi, PE Davis… - Future Generation …, 2023 - Elsevier
In situ processing aims to alleviate the growing gap between computation and I/O
capabilities by performing data processing close to the data source. In situ processing is …

Enhancing supercomputer performance with malleable job scheduling strategies

J Posner, F Hupfeld, P Finnerty - European Conference on Parallel …, 2023 - Springer
In recent years, supercomputers have experienced significant advancements in
performance and have grown in size, now comprising several thousands nodes. To unlock …

A data science pipeline synchronisation method for edge-fog-cloud continuum

DD Sanchez-Gallegos, JL Gonzalez-Compean… - Proceedings of the SC' …, 2023 - dl.acm.org
This paper presents an adaptive data delivery method for data science pipelines. While this
method is feasible for processes communicating over any network, in this work we focus on …

Dynamic Resource Management for Elastic Scientific Workflows using PMIx

R Bhattarai, H Pritchard… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
In current scientific workflows, the computational needs of tasks might not be known when it
is submitted to a system for execution. Current resource management (RM) systems and …