A survey on malleability solutions for high-performance distributed computing
Maintaining a high rate of productivity, in terms of completed jobs per unit of time, in High-
Performance Computing (HPC) facilities is a cornerstone in the next generation of exascale …
Performance Computing (HPC) facilities is a cornerstone in the next generation of exascale …
ElastiSim: a batch-system simulator for malleable workloads
As high-performance computing infrastructures move towards exascale, the role of resource
and job management systems is more critical now than ever. Simulating batch systems to …
and job management systems is more critical now than ever. Simulating batch systems to …
Hybrid workload scheduling on HPC systems
Traditionally, on-demand, rigid, and malleable applications have been scheduled and
executed on separate systems. The ever-growing workload demands and rapidly …
executed on separate systems. The ever-growing workload demands and rapidly …
[PDF][PDF] Multi-Agent Deep Reinforcement Learning-Based Resource Allocation in HPC/AI Converged Cluster.
J Narantuya, JS Shin, S Park… - Computers, Materials & …, 2022 - cdn.techscience.cn
As the complexity of deep learning (DL) networks and training data grows enormously,
methods that scale with computation are becoming the future of artificial intelligence (AI) …
methods that scale with computation are becoming the future of artificial intelligence (AI) …
Transparent resource elasticity for task-based cluster environments with work stealing
J Posner, C Fohry - 50th International Conference on Parallel …, 2021 - dl.acm.org
Resource elasticity allows to dynamically change the resources of running jobs, which may
significantly improve the throughput on supercomputers. Elasticity requires support from …
significantly improve the throughput on supercomputers. Elasticity requires support from …
Software Resource Disaggregation for HPC with Serverless Computing
Aggregated HPC resources have rigid allocation systems and programming models which
struggle to adapt to diverse and changing workloads. Consequently, HPC systems fail to …
struggle to adapt to diverse and changing workloads. Consequently, HPC systems fail to …
Adaptive elasticity policies for staging-based in situ visualization
In situ processing aims to alleviate the growing gap between computation and I/O
capabilities by performing data processing close to the data source. In situ processing is …
capabilities by performing data processing close to the data source. In situ processing is …
Enhancing supercomputer performance with malleable job scheduling strategies
J Posner, F Hupfeld, P Finnerty - European Conference on Parallel …, 2023 - Springer
In recent years, supercomputers have experienced significant advancements in
performance and have grown in size, now comprising several thousands nodes. To unlock …
performance and have grown in size, now comprising several thousands nodes. To unlock …
A data science pipeline synchronisation method for edge-fog-cloud continuum
DD Sanchez-Gallegos, JL Gonzalez-Compean… - Proceedings of the SC' …, 2023 - dl.acm.org
This paper presents an adaptive data delivery method for data science pipelines. While this
method is feasible for processes communicating over any network, in this work we focus on …
method is feasible for processes communicating over any network, in this work we focus on …
Dynamic Resource Management for Elastic Scientific Workflows using PMIx
R Bhattarai, H Pritchard… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
In current scientific workflows, the computational needs of tasks might not be known when it
is submitted to a system for execution. Current resource management (RM) systems and …
is submitted to a system for execution. Current resource management (RM) systems and …