HPX--an open source C++ standard library for parallelism and concurrency

T Heller, P Diehl, Z Byerly, J Biddiscombe… - arXiv preprint arXiv …, 2023 - arxiv.org
To achieve scalability with today's heterogeneous HPC resources, we need a dramatic shift
in our thinking; MPI+ X is not enough. Asynchronous Many Task (AMT) runtime systems …

Enabling communication concurrency through flexible MPI endpoints

J Dinan, RE Grant, P Balaji, D Goodell… - … Journal of High …, 2014 - journals.sagepub.com
MPI defines a one-to-one relationship between MPI processes and ranks. This model
captures many use cases effectively; however, it also limits communication concurrency and …

MT-MPI: Multithreaded MPI for many-core environments

M Si, AJ Peña, P Balaji, M Takagi… - Proceedings of the 28th …, 2014 - dl.acm.org
Many-core architectures, such as the Intel Xeon Phi, provide dozens of cores and hundreds
of hardware threads. To utilize such architectures, application programmers are increasingly …

Enabling efficient multithreaded MPI communication through a library-based implementation of MPI endpoints

S Sridharan, J Dinan… - SC'14: Proceedings of the …, 2014 - ieeexplore.ieee.org
Modern high-speed interconnection networks are designed with capabilities to support
communication from multiple processor cores. The MPI endpoints extension has been …

Why is MPI so slow? analyzing the fundamental limits in implementing MPI-3.1

K Raffenetti, A Amer, L Oden, C Archer… - Proceedings of the …, 2017 - dl.acm.org
This paper provides an in-depth analysis of the software overheads in the MPI performance-
critical path and exposes mandatory performance overheads that are unavoidable based on …

How I learned to stop worrying about user-visible endpoints and love MPI

R Zambre, A Chandramowliswharan… - Proceedings of the 34th …, 2020 - dl.acm.org
MPI+ threads is gaining prominence as an alternative to the traditional" MPI everywhere"
model in order to better handle the disproportionate increase in the number of cores …

Lessons learned on MPI+ threads communication

R Zambre… - … Conference for High …, 2022 - ieeexplore.ieee.org
Hybrid MPI+ threads programming is gaining prominence, but, in practice, applications
perform slower with it compared to the MPI everywhere model. The most critical challenge to …

Callback-based completion notification using MPI Continuations

J Schuchart, P Samfass, C Niethammer, J Gracia… - Parallel Computing, 2021 - Elsevier
Asynchronous programming models (APM) are gaining more and more traction, allowing
applications to expose the available concurrency to a runtime system tasked with …

Process-in-process: techniques for practical address-space sharing

A Hori, M Si, B Gerofi, M Takagi, J Dayal… - Proceedings of the 27th …, 2018 - dl.acm.org
The two most common parallel execution models for many-core CPUs today are
multiprocess (eg, MPI) and multithread (eg, OpenMP). The multiprocess model allows each …

A heterogeneous MPI+ PPL task scheduling approach for asynchronous many-task runtime systems

J Holmen, D Sahasrabudhe, M Berzins - Practice and Experience in …, 2021 - dl.acm.org
Asynchronous many-task runtime systems and MPI+ X hybrid parallelism approaches have
shown promise for helping manage the increasing complexity of nodes in current and …