Programming for exascale computers

W Gropp, M Snir - Computing in Science & Engineering, 2013 - ieeexplore.ieee.org
Exascale systems will present programmers with many challenges. The authors review the
parallel programming models that are appropriate for such systems and the challenges that …

Towards efficient remote openmp offloading

W Lu, B Shan, E Raut, J Meng, M Araya-Polo… - … Workshop on OpenMP, 2022 - Springer
On modern heterogeneous HPC systems, the most popular way to realize distributed
computation is the hybrid programming model of MPI+ X (X being OpenMP/CUDA/etc.), as it …

MPI-based Remote OpenMP Offloading: A More Efficient and Easy-to-use Implementation

B Shan, M Araya-Polo, AM Malik… - Proceedings of the 14th …, 2023 - dl.acm.org
MPI+ X is the most popular hybrid programming model for distributed computation on
modern heterogeneous HPC systems. Nonetheless, for simplicity, HPC developers ideally …

The tiny-tasks granularity trade-off: Balancing overhead versus performance in parallel systems

S Bora, B Walker, M Fidler - IEEE Transactions on Parallel and …, 2023 - ieeexplore.ieee.org
Models of parallel processing systems typically assume that one has workers and jobs are
split into an equal number of tasks. Splitting jobs into smaller tasks, ie using “tiny tasks”, can …

[HTML][HTML] Efficient memory copy operations on the 48-core intel scc processor

MW Van Tol, R Bakker, M Verstraaten… - 3rd Many-core …, 2011 - books.google.com
The Single-chip Cloud Computer (SCC) is a 48-core experimental processor created by Intel
Labs targeting the many-core research community. It has hardware support for sending short …

How to scale nested openmp applications on the scalemp VSMP architecture

D Schmidl, C Terboven, A Wolf… - … on Cluster Computing, 2010 - ieeexplore.ieee.org
The novel ScaleMP vSMP architecture employs commodity x86-based servers with an
InfiniBand network to assemble a large shared memory system at an attractive price point …

Performance metrics in a hybrid MPI–OpenMP based molecular dynamics simulation with short-range interactions

A Pal, A Agarwala, S Raha, B Bhattacharya - Journal of Parallel and …, 2014 - Elsevier
We discuss the computational bottlenecks in molecular dynamics (MD) and describe the
challenges in parallelizing the computation-intensive tasks. We present a hybrid algorithm …

Unified programming concepts for unobtrusive integration of cloud-based and local parallel computing

M Mehrabi, N Giacaman, O Sinnen - Future Generation Computer Systems, 2021 - Elsevier
The growth in the data and computation need of today's operations has led to technical
solutions that distribute workload over several entities for better performance. To facilitate …

[PDF][PDF] A Fast Inter-Kernel Communication and Synchronization layer for MetalSVM.

P Reble, S Lankes, C Clauss, T Bemmerl - MARC Symposium, 2011 - lfbs.rwth-aachen.de
In this paper, we present the basic concepts for fast inter-kernel communication and
synchronization layer motivated by the realization of a SCC-related shared virtual memory …

Using shared arrays in message-driven parallel programs

P Miller, A Becker, L Kalé - 2011 IEEE International …, 2011 - ieeexplore.ieee.org
This paper describes a safe and efficient combination of the object-based message-driven
execution and shared array parallel programming models. In particular, we demonstrate …