Process placement in multicore clusters: Algorithmic issues and practical techniques

E Jeannot, G Mercier, F Tessier - IEEE Transactions on Parallel …, 2013 - ieeexplore.ieee.org
Current generations of NUMA node clusters feature multicore or manycore processors.
Programming such architectures efficiently is a challenge because numerous hardware …

KNEM: A generic and scalable kernel-assisted intra-node MPI communication framework

B Goglin, S Moreaud - Journal of Parallel and Distributed Computing, 2013 - Elsevier
The multiplication of cores in today's architectures raises the importance of intra-node
communication in modern clusters and their impact on the overall parallel application …

A locality-aware bruck allgather

A Bienz, S Gautam, A Kharel - Proceedings of the 29th European MPI …, 2022 - dl.acm.org
Collective algorithms are an essential part of MPI, allowing application programmers to
utilize underlying optimizations of common distributed operations. The MPI_Allgather …

Adaptive and hierarchical large message all-to-all communication algorithms for large-scale dense gpu systems

KS Khorassani, CH Chu, QG Anthony… - 2021 IEEE/ACM 21st …, 2021 - ieeexplore.ieee.org
In recent years, GPU-enhanced clusters have become more prevalent in High-Performance
Computing (HPC), leading to a demand for more efficient multi-GPU communication. This …

Topology-aware rank reordering for MPI collectives

SH Mirsadeghi, A Afsahi - 2016 IEEE International Parallel and …, 2016 - ieeexplore.ieee.org
As we move toward the Exascale era, HPC systems are becoming more complex,
introducing increasing levels of heterogeneity in communication channels. This leads to …

Node-aware improvements to allreduce

A Bienz, L Olson, W Gropp - 2019 IEEE/ACM Workshop on …, 2019 - ieeexplore.ieee.org
The MPI_Allreduce collective operation is a core kernel of many parallel codebases,
particularly for reductions over a single value per process. The commonly used allreduce …

Topology-aware GPU selection on multi-GPU nodes

I Faraji, SH Mirsadeghi, A Afsahi - 2016 IEEE International …, 2016 - ieeexplore.ieee.org
GPU accelerators have successfully established themselves in modern HPC clusters due to
their high performance and energy efficiency. To increase the GPU computational power in …

HierKNEM: An adaptive framework for kernel-assisted and topology-aware collective communications on many-core clusters

T Ma, G Bosilca, A Bouteiller… - 2012 IEEE 26th …, 2012 - ieeexplore.ieee.org
Multicore Clusters, which have become the most prominent form of High Performance
Computing (HPC) systems, challenge the performance of MPI applications with non uniform …

Prospects and challenges of virtual machine migration in HPC

S Pickartz, C Clauss, J Breitbart… - Concurrency and …, 2018 - Wiley Online Library
The continuous growth of supercomputers is accompanied by increased complexity of the
intra‐node level and the interconnection topology. Consequently, the whole software stack …

Sparbit: towards to a logarithmic-cost and data locality-aware MPI allgather algorithm

WJ Loch, GP Koslovski - Journal of Grid Computing, 2023 - Springer
Collective communication operations are considered critical for improving the performance
of exascale-ready and high-performance computing applications. On this work we focus on …