Quiet neighborhoods: Key to protect job performance predictability
A Jokanovic, JC Sancho, G Rodriguez… - 2015 IEEE …, 2015 - ieeexplore.ieee.org
Interference of nearby jobs has been recently identified as the dominant reason for the high
performance variability of parallel applications running on High Performance Computing …
performance variability of parallel applications running on High Performance Computing …
Simulating and evaluating interconnection networks with INSEE
This paper describes INSEE, a simulation framework developed at the University of the
Basque Country. INSEE is designed to carry out performance-related studies of …
Basque Country. INSEE is designed to carry out performance-related studies of …
Links as a service (LaaS) Guaranteed tenant isolation in the shared cloud
The most demanding tenants of shared clouds require complete isolation from their
neighbors, in order to guarantee that their application performance is not affected by other …
neighbors, in order to guarantee that their application performance is not affected by other …
L-PBF high-throughput data pipeline approach for multi-modal integration
Metal-based additive manufacturing requires active monitoring solutions for assessing part
quality. Multiple sensors and data streams, however, generate large heterogeneous data …
quality. Multiple sensors and data streams, however, generate large heterogeneous data …
[HTML][HTML] Integer programming based heterogeneous cpu–gpu cluster schedulers for slurm resource manager
We present two integer programming based heterogeneous CPU–GPU cluster schedulers,
called IPSCHED and AUCSCHED, for the widely used SLURM resource manager. Our …
called IPSCHED and AUCSCHED, for the widely used SLURM resource manager. Our …
Balancing job performance with system performance via locality-aware scheduling on torus-connected systems
Torus-connected network is widely used in modern supercomputers due to its linear per
node cost scaling and its competitive overall performance. Job scheduling system plays a …
node cost scaling and its competitive overall performance. Job scheduling system plays a …
[HTML][HTML] INRFlow: An interconnection networks research flow-level simulation framework
This paper presents INRFlow, a mature, frugal, flow-level simulation framework for modelling
large-scale networks and computing systems. INRFlow is designed to carry out performance …
large-scale networks and computing systems. INRFlow is designed to carry out performance …
Job migration in hpc clusters by means of checkpoint/restart
M Rodríguez-Pascual, J Cao, JA Moríñigo… - The Journal of …, 2019 - Springer
Until now, jobs running on HPC clusters were tied to the node where their execution started.
We have removed that limitation by integrating a user-level checkpoint/restart library into a …
We have removed that limitation by integrating a user-level checkpoint/restart library into a …
[PDF][PDF] A taxonomy of schedulers–operating systems, clusters and big data frameworks
L Sliwko - Global Journal of Computer Science and Technology, 2019 - researchgate.net
This review analyzes deployed and actively used workload schedulers' solutions and
presents a taxonomy in which those systems are divided into several hierarchical groups …
presents a taxonomy in which those systems are divided into several hierarchical groups …
Performance and energy footprint assessment of FPGAs and GPUs on HPC systems using astrophysics application
New challenges in Astronomy and Astrophysics (AA) are urging the need for many
exceptionally computationally intensive simulations.“Exascale”(and beyond) computational …
exceptionally computationally intensive simulations.“Exascale”(and beyond) computational …