PAMI: A parallel active message interface for the Blue Gene/Q supercomputer
S Kumar, AR Mamidala, DA Faraj… - 2012 IEEE 26th …, 2012 - ieeexplore.ieee.org
The Blue Gene/Q machine is the next generation in the line of IBM massively parallel
supercomputers, designed to scale to 262144 nodes and sixteen million threads. With each …
supercomputers, designed to scale to 262144 nodes and sixteen million threads. With each …
Enabling communication concurrency through flexible MPI endpoints
MPI defines a one-to-one relationship between MPI processes and ranks. This model
captures many use cases effectively; however, it also limits communication concurrency and …
captures many use cases effectively; however, it also limits communication concurrency and …
Development of a knowledge-sharing parallel computing approach for calibrating distributed watershed hydrologic models
M Asgari, W Yang, J Lindsay, H Shao, Y Liu… - … Modelling & Software, 2023 - Elsevier
A research gap in calibrating distributed watershed hydrologic models lies in the
development of calibration frameworks adaptable to increasing complexity of hydrologic …
development of calibration frameworks adaptable to increasing complexity of hydrologic …
Enabling MPI interoperability through flexible communication endpoints
The current MPI model defines a one-to-one relationship between MPI processes and MPI
ranks. This model captures many use cases effectively, such as one MPI process per core …
ranks. This model captures many use cases effectively, such as one MPI process per core …
Programming for exascale computers
Exascale systems will present programmers with many challenges. The authors review the
parallel programming models that are appropriate for such systems and the challenges that …
parallel programming models that are appropriate for such systems and the challenges that …
[PDF][PDF] MPI at Exascale
With petascale systems already available, researchers are devoting their attention to the
issues needed to reach the next major level in performance, namely, exascale. Explicit …
issues needed to reach the next major level in performance, namely, exascale. Explicit …
Exascale machines require new programming paradigms and runtimes
Extreme scale parallel computing systems will have tens of thousands of optionally
accelerator-equiped nodes with hundreds of cores each, as well as deep memory …
accelerator-equiped nodes with hundreds of cores each, as well as deep memory …
Efficient data race detection for distributed memory parallel programs
In this paper we present a precise data race detection technique for distributed memory
parallel programs. Our technique, which we call Active Testing, builds on our previous work …
parallel programs. Our technique, which we call Active Testing, builds on our previous work …
CIVL: formal verification of parallel programs
CIVL is a framework for static analysis and verification of concurrent programs. One of the
main challenges to practical application of these techniques is the large number of ways to …
main challenges to practical application of these techniques is the large number of ways to …
Multi-level load balancing with an integrated runtime approach
The recent trend of increasing numbers of cores per chip has resulted in vast amounts of on-
node parallelism. These high core counts result in hardware variability that introduces …
node parallelism. These high core counts result in hardware variability that introduces …