Programming for exascale computers
Exascale systems will present programmers with many challenges. The authors review the
parallel programming models that are appropriate for such systems and the challenges that …
parallel programming models that are appropriate for such systems and the challenges that …
Towards efficient remote openmp offloading
On modern heterogeneous HPC systems, the most popular way to realize distributed
computation is the hybrid programming model of MPI+ X (X being OpenMP/CUDA/etc.), as it …
computation is the hybrid programming model of MPI+ X (X being OpenMP/CUDA/etc.), as it …
MPI-based Remote OpenMP Offloading: A More Efficient and Easy-to-use Implementation
MPI+ X is the most popular hybrid programming model for distributed computation on
modern heterogeneous HPC systems. Nonetheless, for simplicity, HPC developers ideally …
modern heterogeneous HPC systems. Nonetheless, for simplicity, HPC developers ideally …
The tiny-tasks granularity trade-off: Balancing overhead versus performance in parallel systems
Models of parallel processing systems typically assume that one has workers and jobs are
split into an equal number of tasks. Splitting jobs into smaller tasks, ie using “tiny tasks”, can …
split into an equal number of tasks. Splitting jobs into smaller tasks, ie using “tiny tasks”, can …
[HTML][HTML] Efficient memory copy operations on the 48-core intel scc processor
The Single-chip Cloud Computer (SCC) is a 48-core experimental processor created by Intel
Labs targeting the many-core research community. It has hardware support for sending short …
Labs targeting the many-core research community. It has hardware support for sending short …
How to scale nested openmp applications on the scalemp VSMP architecture
D Schmidl, C Terboven, A Wolf… - … on Cluster Computing, 2010 - ieeexplore.ieee.org
The novel ScaleMP vSMP architecture employs commodity x86-based servers with an
InfiniBand network to assemble a large shared memory system at an attractive price point …
InfiniBand network to assemble a large shared memory system at an attractive price point …
Performance metrics in a hybrid MPI–OpenMP based molecular dynamics simulation with short-range interactions
We discuss the computational bottlenecks in molecular dynamics (MD) and describe the
challenges in parallelizing the computation-intensive tasks. We present a hybrid algorithm …
challenges in parallelizing the computation-intensive tasks. We present a hybrid algorithm …
Unified programming concepts for unobtrusive integration of cloud-based and local parallel computing
M Mehrabi, N Giacaman, O Sinnen - Future Generation Computer Systems, 2021 - Elsevier
The growth in the data and computation need of today's operations has led to technical
solutions that distribute workload over several entities for better performance. To facilitate …
solutions that distribute workload over several entities for better performance. To facilitate …
[PDF][PDF] A Fast Inter-Kernel Communication and Synchronization layer for MetalSVM.
In this paper, we present the basic concepts for fast inter-kernel communication and
synchronization layer motivated by the realization of a SCC-related shared virtual memory …
synchronization layer motivated by the realization of a SCC-related shared virtual memory …
Using shared arrays in message-driven parallel programs
This paper describes a safe and efficient combination of the object-based message-driven
execution and shared array parallel programming models. In particular, we demonstrate …
execution and shared array parallel programming models. In particular, we demonstrate …