First experiences with intel cluster openmp

W Gropp, M Snir - Computing in Science & Engineering, 2013 - ieeexplore.ieee.org

Exascale systems will present programmers with many challenges. The authors review the
parallel programming models that are appropriate for such systems and the challenges that …

被引用次数：67 相关文章所有 10 个版本

Towards efficient remote openmp offloading

W Lu, B Shan, E Raut, J Meng, M Araya-Polo… - … Workshop on OpenMP, 2022 - Springer

On modern heterogeneous HPC systems, the most popular way to realize distributed
computation is the hybrid programming model of MPI+ X (X being OpenMP/CUDA/etc.), as it …

被引用次数：11 相关文章所有 4 个版本

[PDF] stonybrook.edu

MPI-based Remote OpenMP Offloading: A More Efficient and Easy-to-use Implementation

B Shan, M Araya-Polo, AM Malik… - Proceedings of the 14th …, 2023 - dl.acm.org

MPI+ X is the most popular hybrid programming model for distributed computation on
modern heterogeneous HPC systems. Nonetheless, for simplicity, HPC developers ideally …

被引用次数：8 相关文章所有 3 个版本

[PDF] ieee.org

The tiny-tasks granularity trade-off: Balancing overhead versus performance in parallel systems

S Bora, B Walker, M Fidler - IEEE Transactions on Parallel and …, 2023 - ieeexplore.ieee.org

Models of parallel processing systems typically assume that one has workers and jobs are
split into an equal number of tasks. Splitting jobs into smaller tasks, ie using “tiny tasks”, can …

被引用次数：4 相关文章所有 7 个版本

[HTML] google.com

[HTML][HTML] Efficient memory copy operations on the 48-core intel scc processor

MW Van Tol, R Bakker, M Verstraaten… - 3rd Many-core …, 2011 - books.google.com

The Single-chip Cloud Computer (SCC) is a 48-core experimental processor created by Intel
Labs targeting the many-core research community. It has hardware support for sending short …

被引用次数：30 相关文章所有 8 个版本

[PDF] semanticscholar.org

How to scale nested openmp applications on the scalemp VSMP architecture

D Schmidl, C Terboven, A Wolf… - … on Cluster Computing, 2010 - ieeexplore.ieee.org

The novel ScaleMP vSMP architecture employs commodity x86-based servers with an
InfiniBand network to assemble a large shared memory system at an attractive price point …

被引用次数：26 相关文章所有 8 个版本

[PDF] arxiv.org

Performance metrics in a hybrid MPI–OpenMP based molecular dynamics simulation with short-range interactions

A Pal, A Agarwala, S Raha, B Bhattacharya - Journal of Parallel and …, 2014 - Elsevier

We discuss the computational bottlenecks in molecular dynamics (MD) and describe the
challenges in parallelizing the computation-intensive tasks. We present a hybrid algorithm …

被引用次数：18 相关文章所有 11 个版本

Unified programming concepts for unobtrusive integration of cloud-based and local parallel computing

M Mehrabi, N Giacaman, O Sinnen - Future Generation Computer Systems, 2021 - Elsevier

The growth in the data and computation need of today's operations has led to technical
solutions that distribute workload over several entities for better performance. To facilitate …

被引用次数：5 相关文章所有 2 个版本

[PDF] rwth-aachen.de

[PDF][PDF] A Fast Inter-Kernel Communication and Synchronization layer for MetalSVM.

P Reble, S Lankes, C Clauss, T Bemmerl - MARC Symposium, 2011 - lfbs.rwth-aachen.de

In this paper, we present the basic concepts for fast inter-kernel communication and
synchronization layer motivated by the realization of a SCC-related shared virtual memory …

被引用次数：20 相关文章所有 5 个版本

[PDF] psu.edu

Using shared arrays in message-driven parallel programs

P Miller, A Becker, L Kalé - 2011 IEEE International …, 2011 - ieeexplore.ieee.org

This paper describes a safe and efficient combination of the object-based message-driven
execution and shared array parallel programming models. In particular, we demonstrate …

被引用次数：16 相关文章所有 15 个版本