Process distance-aware adaptive MPI collective communications

E Jeannot, G Mercier, F Tessier - IEEE Transactions on Parallel …, 2013 - ieeexplore.ieee.org

Current generations of NUMA node clusters feature multicore or manycore processors.
Programming such architectures efficiently is a challenge because numerous hardware …

被引用次数：154 相关文章所有 18 个版本

[PDF] hal.science

KNEM: A generic and scalable kernel-assisted intra-node MPI communication framework

B Goglin, S Moreaud - Journal of Parallel and Distributed Computing, 2013 - Elsevier

The multiplication of cores in today's architectures raises the importance of intra-node
communication in modern clusters and their impact on the overall parallel application …

被引用次数：111 相关文章所有 12 个版本

[PDF] acm.org

A locality-aware bruck allgather

A Bienz, S Gautam, A Kharel - Proceedings of the 29th European MPI …, 2022 - dl.acm.org

Collective algorithms are an essential part of MPI, allowing application programmers to
utilize underlying optimizations of common distributed operations. The MPI_Allgather …

被引用次数：12 相关文章所有 7 个版本

[PDF] nsf.gov

Adaptive and hierarchical large message all-to-all communication algorithms for large-scale dense gpu systems

KS Khorassani, CH Chu, QG Anthony… - 2021 IEEE/ACM 21st …, 2021 - ieeexplore.ieee.org

In recent years, GPU-enhanced clusters have become more prevalent in High-Performance
Computing (HPC), leading to a demand for more efficient multi-GPU communication. This …

被引用次数：16 相关文章所有 3 个版本

[PDF] queensu.ca

Topology-aware rank reordering for MPI collectives

SH Mirsadeghi, A Afsahi - 2016 IEEE International Parallel and …, 2016 - ieeexplore.ieee.org

As we move toward the Exascale era, HPC systems are becoming more complex,
introducing increasing levels of heterogeneity in communication channels. This leads to …

被引用次数：39 相关文章所有 3 个版本

[PDF] arxiv.org

Node-aware improvements to allreduce

A Bienz, L Olson, W Gropp - 2019 IEEE/ACM Workshop on …, 2019 - ieeexplore.ieee.org

The MPI_Allreduce collective operation is a core kernel of many parallel codebases,
particularly for reductions over a single value per process. The commonly used allreduce …

被引用次数：19 相关文章所有 9 个版本

[PDF] anl.gov

Topology-aware GPU selection on multi-GPU nodes

I Faraji, SH Mirsadeghi, A Afsahi - 2016 IEEE International …, 2016 - ieeexplore.ieee.org

GPU accelerators have successfully established themselves in modern HPC clusters due to
their high performance and energy efficiency. To increase the GPU computational power in …

被引用次数：32 相关文章所有 5 个版本

[PDF] academia.edu

HierKNEM: An adaptive framework for kernel-assisted and topology-aware collective communications on many-core clusters

T Ma, G Bosilca, A Bouteiller… - 2012 IEEE 26th …, 2012 - ieeexplore.ieee.org

Multicore Clusters, which have become the most prominent form of High Performance
Computing (HPC) systems, challenge the performance of MPI applications with non uniform …

被引用次数：51 相关文章所有 16 个版本

[PDF] wiley.com

Prospects and challenges of virtual machine migration in HPC

S Pickartz, C Clauss, J Breitbart… - Concurrency and …, 2018 - Wiley Online Library

The continuous growth of supercomputers is accompanied by increased complexity of the
intra‐node level and the interconnection topology. Consequently, the whole software stack …

被引用次数：12 相关文章

Sparbit: towards to a logarithmic-cost and data locality-aware MPI allgather algorithm

WJ Loch, GP Koslovski - Journal of Grid Computing, 2023 - Springer

Collective communication operations are considered critical for improving the performance
of exascale-ready and high-performance computing applications. On this work we focus on …

被引用次数：2 相关文章所有 3 个版本