A survey on malleability solutions for high-performance distributed computing

JI Aliaga, M Castillo, S Iserte, I Martín-Álvarez… - Applied Sciences, 2022 - mdpi.com
Maintaining a high rate of productivity, in terms of completed jobs per unit of time, in High-
Performance Computing (HPC) facilities is a cornerstone in the next generation of exascale …

Dynamic spawning of MPI processes applied to malleability

I Martín-Álvarez, JI Aliaga, M Castillo… - … Journal of High …, 2024 - journals.sagepub.com
Malleability allows computing facilities to adapt their workloads through resource
management systems to maximize the throughput of the facility and the efficiency of the …

Transparent resource elasticity for task-based cluster environments with work stealing

J Posner, C Fohry - 50th International Conference on Parallel …, 2021 - dl.acm.org
Resource elasticity allows to dynamically change the resources of running jobs, which may
significantly improve the throughput on supercomputers. Elasticity requires support from …

Malleable APGAS programs and their support in batch job schedulers

P Finnerty, L Takaoka, T Kanzaki, J Posner - European Conference on …, 2023 - Springer
Malleability—the ability for applications to dynamically adjust their resource allocations at
runtime—presents great potential to enhance the efficiency and resource utilization of …

Enhancing supercomputer performance with malleable job scheduling strategies

J Posner, F Hupfeld, P Finnerty - European Conference on Parallel …, 2023 - Springer
In recent years, supercomputers have experienced significant advancements in
performance and have grown in size, now comprising several thousands nodes. To unlock …

Proteo: a framework for the generation and evaluation of malleable MPI applications

I Martín-Álvarez, JI Aliaga, M Castillo… - The Journal of …, 2024 - Springer
Applying malleability to HPC systems can increase their productivity without degrading or
even improving the performance of running applications. This paper presents Proteo, a …

Scheduling of elastic message passing applications on hpc systems

DH Lina, S Ghafoor, T Hines - Workshop on Job Scheduling Strategies for …, 2022 - Springer
Elastic parallel applications that can change the number of processors while being executed
promise improved application and system performance, allow new classes of data and event …

Evaluating Data Redistribution in PaRSEC

Q Cao, G Bosilca, N Losada, W Wu… - … on Parallel and …, 2021 - ieeexplore.ieee.org
Data redistribution aims to reshuffle data to optimize some objective for an algorithm. The
objective can be multi-dimensional, such as improving computational load balance or …

An emulation layer for dynamic resources with MPI sessions

J Fecht, M Schreiber, M Schulz, H Pritchard… - … Conference on High …, 2022 - Springer
The current static job scheduling on supercomputers for MPI-based applications is well
known to be a limiting factor for the exploitation of a system's top performance in terms of …

Malleability in Modern HPC Systems: Current Experiences, Challenges, and Future Opportunities

A Tarraf, M Schreiber, A Cascajo… - … on Parallel and …, 2024 - ieeexplore.ieee.org
With the increase of complex scientific simulations driven by workflows and heterogeneous
workload profiles, managing system resources effectively is essential for improving …