PAMI: A parallel active message interface for the Blue Gene/Q supercomputer

S Kumar, AR Mamidala, DA Faraj… - 2012 IEEE 26th …, 2012 - ieeexplore.ieee.org
The Blue Gene/Q machine is the next generation in the line of IBM massively parallel
supercomputers, designed to scale to 262144 nodes and sixteen million threads. With each …

Enabling communication concurrency through flexible MPI endpoints

J Dinan, RE Grant, P Balaji, D Goodell… - … Journal of High …, 2014 - journals.sagepub.com
MPI defines a one-to-one relationship between MPI processes and ranks. This model
captures many use cases effectively; however, it also limits communication concurrency and …

Development of a knowledge-sharing parallel computing approach for calibrating distributed watershed hydrologic models

M Asgari, W Yang, J Lindsay, H Shao, Y Liu… - … Modelling & Software, 2023 - Elsevier
A research gap in calibrating distributed watershed hydrologic models lies in the
development of calibration frameworks adaptable to increasing complexity of hydrologic …

Enabling MPI interoperability through flexible communication endpoints

J Dinan, P Balaji, D Goodell, D Miller, M Snir… - Proceedings of the 20th …, 2013 - dl.acm.org
The current MPI model defines a one-to-one relationship between MPI processes and MPI
ranks. This model captures many use cases effectively, such as one MPI process per core …

Programming for exascale computers

W Gropp, M Snir - Computing in Science & Engineering, 2013 - ieeexplore.ieee.org
Exascale systems will present programmers with many challenges. The authors review the
parallel programming models that are appropriate for such systems and the challenges that …

[PDF][PDF] MPI at Exascale

R Thakur, P Balaji, D Buntinas, D Goodell… - Procceedings of …, 2010 - aegjcef.unixer.de
With petascale systems already available, researchers are devoting their attention to the
issues needed to reach the next major level in performance, namely, exascale. Explicit …

Exascale machines require new programming paradigms and runtimes

G Da Costa, T Fahringer, JAR Gallego… - Supercomputing …, 2015 - superfri.org
Extreme scale parallel computing systems will have tens of thousands of optionally
accelerator-equiped nodes with hundreds of cores each, as well as deep memory …

Efficient data race detection for distributed memory parallel programs

CS Park, K Sen, P Hargrove, C Iancu - Proceedings of 2011 International …, 2011 - dl.acm.org
In this paper we present a precise data race detection technique for distributed memory
parallel programs. Our technique, which we call Active Testing, builds on our previous work …

CIVL: formal verification of parallel programs

M Zheng, MS Rogers, Z Luo, MB Dwyer… - 2015 30th IEEE/ACM …, 2015 - ieeexplore.ieee.org
CIVL is a framework for static analysis and verification of concurrent programs. One of the
main challenges to practical application of these techniques is the large number of ways to …

Multi-level load balancing with an integrated runtime approach

S Bak, H Menon, S White, M Diener… - 2018 18th IEEE/ACM …, 2018 - ieeexplore.ieee.org
The recent trend of increasing numbers of cores per chip has resulted in vast amounts of on-
node parallelism. These high core counts result in hardware variability that introduces …