Implementing OpenSHMEM using MPI-3 one-sided communication

JR Hammond, S Ghosh, BM Chapman - … , Annapolis, MD, USA, March 4-6 …, 2014 - Springer
This paper reports the design and implementation of Open-SHMEM over MPI using new one-
sided communication features in MPI-3, which include not only new functions (eg remote …

A comprehensive performance evaluation of OpenSHMEM libraries on InfiniBand clusters

J Jose, J Zhang, A Venkatesh, S Potluri… - … on OpenSHMEM and …, 2014 - Springer
OpenSHMEM is an open standard that brings together several long-standing, vendor-
specific SHMEM implementations that allows applications to use SHMEM in a platform …

Proxy-equation paradigm: A strategy for massively parallel asynchronous computations

A Mittal, S Girimaji - Physical Review E, 2017 - APS
Massively parallel simulations of transport equation systems call for a paradigm change in
algorithm development to achieve efficient scalability. Traditional approaches require time …

Maximizing application performance in a multi-core, NUMA-aware compute cluster by multi-level tuning

G Shainer, P Lui, M Hilgeman, J Layton… - … Conference, ISC 2013 …, 2013 - Springer
Achieving good application performance on a modern compute cluster of multi-core, multi-
socket, NUMA-aware systems can be challenging. In this paper, we use VASP, a popular ab …

Early evaluation of scalable fabric interface for PGAS programming models

M Luo, K Seager, KS Murthy, CJ Archer, S Sur… - Proceedings of the 8th …, 2014 - dl.acm.org
Inter-processor communication is a critical factor for performance at scale. In order to
achieve good performance, communication overheads should be minimized. The fabric …

[PDF][PDF] Optimal partitioning for parallel matrix computation on a small number of abstract heterogeneous processors

A DeFlumere - 2014 - 137.43.92.117
Abstract High Performance Computing (HPC) has grown to encompass many new
architectures and algorithms. The Top500 list, which ranks the world's fastest …

Effects of Processor-Native Memory Transactions in Optimizing RDMA Transfers in Distributed Shared Memory Systems

K Paraskevas - 2021 - search.proquest.com
Reducing latency and increasing the throughput of issued data transfers is a core
requirement if we are to meet the needs of future systems at scale, and therefore, fast …

Analysing the influence of InfiniBand choice on OpenMPI memory consumption

O Perks, DA Beckingsale, AS Dawes… - … Conference on High …, 2013 - ieeexplore.ieee.org
The ever increasing scale of modern high performance computing platforms poses
challenges for system architects and code developers alike. The increase in core count …

'Proxy-equation'paradigm-A novel strategy for massively-parallel asynchronous computations

A Mittal, S Girimaji - arXiv preprint arXiv:1611.04985, 2016 - arxiv.org
Massively parallel simulations of transport equation systems call for a paradigm change in
algorithm development to achieve efficient scalability. Traditional approaches require time …

Temporal Reasoning in Medicine for Type 2 Diabetes Mellitus Patient Outcomes and Treatments Using Dynamic Bayesian Networks

RL Angell - 2018 - search.proquest.com
Medicine is the art and science of diagnosis and treatment of disease-maintenance of one's
health. Temporal reasoning in medicine is the art and practice of modeling one's …