Quiet neighborhoods: Key to protect job performance predictability

A Jokanovic, JC Sancho, G Rodriguez… - 2015 IEEE …, 2015 - ieeexplore.ieee.org
Interference of nearby jobs has been recently identified as the dominant reason for the high
performance variability of parallel applications running on High Performance Computing …

Simulating and evaluating interconnection networks with INSEE

J Navaridas, J Miguel-Alonso, JA Pascual… - … Modelling Practice and …, 2011 - Elsevier
This paper describes INSEE, a simulation framework developed at the University of the
Basque Country. INSEE is designed to carry out performance-related studies of …

Links as a service (LaaS) Guaranteed tenant isolation in the shared cloud

E Zahavi, A Shpiner, O Rottenstreich… - Proceedings of the …, 2016 - dl.acm.org
The most demanding tenants of shared clouds require complete isolation from their
neighbors, in order to guarantee that their application performance is not affected by other …

L-PBF high-throughput data pipeline approach for multi-modal integration

KJ Hernandez, TG Ciardi, R Yamamoto, M Lu… - Integrating Materials and …, 2024 - Springer
Metal-based additive manufacturing requires active monitoring solutions for assessing part
quality. Multiple sensors and data streams, however, generate large heterogeneous data …

[HTML][HTML] Integer programming based heterogeneous cpu–gpu cluster schedulers for slurm resource manager

S Soner, C Özturan - Journal of computer and system sciences, 2015 - Elsevier
We present two integer programming based heterogeneous CPU–GPU cluster schedulers,
called IPSCHED and AUCSCHED, for the widely used SLURM resource manager. Our …

Balancing job performance with system performance via locality-aware scheduling on torus-connected systems

X Yang, Z Zhou, W Tang, X Zheng… - … on Cluster Computing …, 2014 - ieeexplore.ieee.org
Torus-connected network is widely used in modern supercomputers due to its linear per
node cost scaling and its competitive overall performance. Job scheduling system plays a …

[HTML][HTML] INRFlow: An interconnection networks research flow-level simulation framework

J Navaridas, JA Pascual, A Erickson, IA Stewart… - Journal of parallel and …, 2019 - Elsevier
This paper presents INRFlow, a mature, frugal, flow-level simulation framework for modelling
large-scale networks and computing systems. INRFlow is designed to carry out performance …

Job migration in hpc clusters by means of checkpoint/restart

M Rodríguez-Pascual, J Cao, JA Moríñigo… - The Journal of …, 2019 - Springer
Until now, jobs running on HPC clusters were tied to the node where their execution started.
We have removed that limitation by integrating a user-level checkpoint/restart library into a …

[PDF][PDF] A taxonomy of schedulers–operating systems, clusters and big data frameworks

L Sliwko - Global Journal of Computer Science and Technology, 2019 - researchgate.net
This review analyzes deployed and actively used workload schedulers' solutions and
presents a taxonomy in which those systems are divided into several hierarchical groups …

Performance and energy footprint assessment of FPGAs and GPUs on HPC systems using astrophysics application

D Goz, G Ieronymakis, V Papaefstathiou, N Dimou… - Computation, 2020 - mdpi.com
New challenges in Astronomy and Astrophysics (AA) are urging the need for many
exceptionally computationally intensive simulations.“Exascale”(and beyond) computational …