Process placement in multicore clusters: Algorithmic issues and practical techniques
Current generations of NUMA node clusters feature multicore or manycore processors.
Programming such architectures efficiently is a challenge because numerous hardware …
Programming such architectures efficiently is a challenge because numerous hardware …
KNEM: A generic and scalable kernel-assisted intra-node MPI communication framework
The multiplication of cores in today's architectures raises the importance of intra-node
communication in modern clusters and their impact on the overall parallel application …
communication in modern clusters and their impact on the overall parallel application …
A locality-aware bruck allgather
Collective algorithms are an essential part of MPI, allowing application programmers to
utilize underlying optimizations of common distributed operations. The MPI_Allgather …
utilize underlying optimizations of common distributed operations. The MPI_Allgather …
Adaptive and hierarchical large message all-to-all communication algorithms for large-scale dense gpu systems
In recent years, GPU-enhanced clusters have become more prevalent in High-Performance
Computing (HPC), leading to a demand for more efficient multi-GPU communication. This …
Computing (HPC), leading to a demand for more efficient multi-GPU communication. This …
Topology-aware rank reordering for MPI collectives
SH Mirsadeghi, A Afsahi - 2016 IEEE International Parallel and …, 2016 - ieeexplore.ieee.org
As we move toward the Exascale era, HPC systems are becoming more complex,
introducing increasing levels of heterogeneity in communication channels. This leads to …
introducing increasing levels of heterogeneity in communication channels. This leads to …
Node-aware improvements to allreduce
The MPI_Allreduce collective operation is a core kernel of many parallel codebases,
particularly for reductions over a single value per process. The commonly used allreduce …
particularly for reductions over a single value per process. The commonly used allreduce …
Topology-aware GPU selection on multi-GPU nodes
I Faraji, SH Mirsadeghi, A Afsahi - 2016 IEEE International …, 2016 - ieeexplore.ieee.org
GPU accelerators have successfully established themselves in modern HPC clusters due to
their high performance and energy efficiency. To increase the GPU computational power in …
their high performance and energy efficiency. To increase the GPU computational power in …
HierKNEM: An adaptive framework for kernel-assisted and topology-aware collective communications on many-core clusters
Multicore Clusters, which have become the most prominent form of High Performance
Computing (HPC) systems, challenge the performance of MPI applications with non uniform …
Computing (HPC) systems, challenge the performance of MPI applications with non uniform …
Prospects and challenges of virtual machine migration in HPC
The continuous growth of supercomputers is accompanied by increased complexity of the
intra‐node level and the interconnection topology. Consequently, the whole software stack …
intra‐node level and the interconnection topology. Consequently, the whole software stack …
Sparbit: towards to a logarithmic-cost and data locality-aware MPI allgather algorithm
WJ Loch, GP Koslovski - Journal of Grid Computing, 2023 - Springer
Collective communication operations are considered critical for improving the performance
of exascale-ready and high-performance computing applications. On this work we focus on …
of exascale-ready and high-performance computing applications. On this work we focus on …