MPI datatype processing using runtime compilation

D De Sensi, T Bonato, D Saam, T Hoefler - 21st USENIX Symposium on …, 2024 - usenix.org

The allreduce collective operation accounts for a significant fraction of the runtime of
workloads running on distributed systems. One factor determining its performance is the …

被引用次数：3 相关文章所有 5 个版本

[PDF] arxiv.org

Network-accelerated non-contiguous memory transfers

S Di Girolamo, K Taranov, A Kurth… - Proceedings of the …, 2019 - dl.acm.org

Applications often communicate data that is non-contiguous in the send-or the receive-
buffer, eg, when exchanging a column of a matrix stored in row-major order. While non …

被引用次数：33 相关文章所有 29 个版本

Hand: A hybrid approach to accelerate non-contiguous data movement using mpi datatypes on gpu clusters

R Shi, X Lu, S Potluri, K Hamidouche… - 2014 43rd …, 2014 - ieeexplore.ieee.org

An increasing number of MPI applications are being ported to take advantage of the
compute power offered by GPUs. Data movement continues to be the major bottleneck on …

被引用次数：33 相关文章所有 3 个版本

[PDF] acm.org

Mpi derived datatypes: Performance and portability issues

Q Xiong, PV Bangalore, A Skjellum… - Proceedings of the 25th …, 2018 - dl.acm.org

This paper addresses performance-portability and overall performance issues when derived
datatypes are used with four MPI implementations: Open MPI, MPICH, MVAPICH2, and Intel …

被引用次数：23 相关文章所有 2 个版本

[PDF] github.io

Falcon: Efficient designs for zero-copy mpi datatype processing on emerging architectures

JM Hashmi, S Chakraborty, M Bayatpour… - 2019 IEEE …, 2019 - ieeexplore.ieee.org

Derived datatypes are commonly used in MPI applications to exchange non-contiguous data
among processes. However, state-of-the-art MPI libraries do not offer efficient processing of …

被引用次数：17 相关文章所有 3 个版本

[PDF] sciencedirect.com

FALCON-X: Zero-copy MPI derived datatype processing on modern CPU and GPU architectures

JM Hashmi, CH Chu, S Chakraborty… - Journal of Parallel and …, 2020 - Elsevier

This paper addresses the challenges of MPI derived datatype processing and proposes
FALCON-X—A Fast and Low-overhead Communication framework for optimized zero-copy …

被引用次数：14 相关文章所有 2 个版本

[PDF] hunoldscience.net

On the expected and observed communication performance with MPI derived datatypes

A Carpen-Amarie, S Hunold, JL Träff - … of the 23rd European MPI Users' …, 2016 - dl.acm.org

We examine natural expectations on communication performance using MPI derived
datatypes in comparison to the baseline," raw" performance of communicating simple …

被引用次数：21 相关文章所有 6 个版本

High performance MPI datatype support with user-mode memory registration: Challenges, designs, and benefits

M Li, H Subramoni, K Hamidouche… - … on Cluster Computing, 2015 - ieeexplore.ieee.org

Noncontiguous data communication has been heavily adopted in scientific applications,
especially for those written with MPI. Common strategies to handle noncontiguous data, like …

被引用次数：22 相关文章所有 5 个版本

[PDF] arxiv.org

FPsPIN: An FPGA-based Open-Hardware Research Platform for Processing in the Network

T Schneider, P Xu, T Hoefler - arXiv preprint arXiv:2405.16378, 2024 - arxiv.org

In the era of post-Moore computing, network offload emerges as a solution to two
challenges: the imperative for low-latency communication and the push towards hardware …

被引用次数：1 相关文章所有 2 个版本

[PDF] ieee.org

Evaluating Data Redistribution in `PaRSEC`

Q Cao, G Bosilca, N Losada, W Wu… - … on Parallel and …, 2021 - ieeexplore.ieee.org

Data redistribution aims to reshuffle data to optimize some objective for an algorithm. The
objective can be multi-dimensional, such as improving computational load balance or …

被引用次数：4 相关文章所有 10 个版本