The case for colocation of high performance computing workloads

A Bhatele, K Mohror, SH Langer… - Proceedings of the …, 2013 - dl.acm.org

Predictable performance is important for understanding and alleviating application
performance issues; quantifying the effects of source code, compiler, or system software …

被引用次数：257 相关文章所有 18 个版本

[PDF] nsf.gov

A slurm simulator: Implementation and parametric analysis

NA Simakov, MD Innus, MD Jones, RL DeLeon… - … , and Simulation: 8th …, 2018 - Springer

Slurm is an open-source resource manager for HPC that provides high configurability for
inhomogeneous resources and job scheduling. Various Slurm parametric settings can …

被引用次数：71 相关文章所有 7 个版本

[PDF] google.com

Satori: efficient and fair resource partitioning by sacrificing short-term benefits for long-term gains

RB Roy, T Patel, D Tiwari - 2021 ACM/IEEE 48th Annual …, 2021 - ieeexplore.ieee.org

Multi-core architectures have enabled data centers to increasingly co-locate multiple jobs to
improve resource utilization and lower the operational cost. Unfortunately, naively co …

被引用次数：28 相关文章所有 4 个版本

[PDF] arxiv.org

Software Resource Disaggregation for HPC with Serverless Computing

M Copik, M Chrapek, L Schmid, A Calotoiu… - arXiv preprint arXiv …, 2024 - arxiv.org

Aggregated HPC resources have rigid allocation systems and programming models which
struggle to adapt to diverse and changing workloads. Consequently, HPC systems fail to …

被引用次数：12 相关文章所有 21 个版本

[PDF] acm.org

Slurm simulator: Improving slurm scheduler performance on large hpc systems by utilization of multiple controllers and node sharing

NA Simakov, RL DeLeon, MD Innus… - Proceedings of the …, 2018 - dl.acm.org

A Slurm simulator was used to study the potential benefits of using multiple Slurm controllers
and node-sharing on the TACC Stampede 2 system. Splitting a large cluster into smaller sub …

被引用次数：23 相关文章所有 3 个版本

[PDF] osti.gov

Enabling fair pricing on hpc systems with node sharing

AD Breslow, A Tiwari, M Schulz, L Carrington… - Proceedings of the …, 2013 - dl.acm.org

Co-location, where multiple jobs share compute nodes in large-scale HPC systems, has
been shown to increase aggregate throughput and energy efficiency by 10 to 20 …

被引用次数：40 相关文章所有 18 个版本

[PDF] diva-portal.org

Hybrid resource management for HPC and data intensive workloads

A Souza, M Rezaei, E Laure… - 2019 19th IEEE/ACM …, 2019 - ieeexplore.ieee.org

High Performance Computing (HPC) and Data Intensive (DI) workloads have been executed
on separate clusters using different tools for resource and application management. With …

被引用次数：16 相关文章所有 6 个版本

Analyzing HPC Monitoring Data With a View Towards Efficient Resource Utilization

S Maloney, E Suarez, N Eicker… - 2024 IEEE 36th …, 2024 - ieeexplore.ieee.org

Compute nodes in modern HPC systems are growing in size and their hardware has
become ever more diverse. Still, many HPC centers allocate the resources of full nodes …

被引用次数：1 相关文章所有 2 个版本

[PDF] researchgate.net

Spread-n-share: improving application performance and cluster throughput with resource-aware job placement

X Tang, H Wang, X Ma, N El-Sayed, J Zhai… - Proceedings of the …, 2019 - dl.acm.org

Traditional batch job schedulers adopt the Compact-n-Exclusive (CE) strategy, packing
processes of a parallel job into as few compute nodes as possible. While CE minimizes inter …

被引用次数：17 相关文章所有 3 个版本

[PDF] tum.de

Improving QoS and Utilisation in modern multi-core servers with Dynamic Cache Partitioning

I Papadakis, K Nikas, V Karakostas… - Proceedings of the …, 2017 - mediatum.ub.tum.de

Co-execution of multiple workloads in modern multi-core servers may create severe
performance degradation and unpredictable execution behavior, impacting significantly their …

被引用次数：21 相关文章所有 16 个版本