GPU-accelerated molecular dynamics: State-of-art software performance and porting from Nvidia CUDA to AMD HIP

N Kondratyuk, V Nikolskiy, D Pavlov… - … Journal of High …, 2021 - journals.sagepub.com
Classical molecular dynamics (MD) calculations represent a significant part of the utilization
time of high-performance computing systems. As usual, the efficiency of such calculations is …

GPU algorithms for efficient exascale discretizations

A Abdelfattah, V Barra, N Beams, R Bleile, J Brown… - Parallel Computing, 2021 - Elsevier
In this paper we describe the research and development activities in the Center for Efficient
Exascale Discretization within the US Exascale Computing Project, targeting state-of-the-art …

Hmg: Extending cache coherence protocols across modern hierarchical multi-gpu systems

X Ren, D Lustig, E Bolotin, A Jaleel… - … Symposium on High …, 2020 - ieeexplore.ieee.org
Prior work on GPU cache coherence has shown that simple hardware-or software-based
protocols can be more than sufficient. However, in recent years, features such as multi-chip …

HipBone: A performance-portable graphics processing unit-accelerated C++ version of the NekBone benchmark

N Chalmers, A Mishra, D McDougall… - … Journal of High …, 2023 - journals.sagepub.com
We present hipBone, an open-source performance-portable proxy application for the
Nek5000 (and NekRS) computational fluid dynamics applications. HipBone is a fully GPU …

Strong scaling of OpenACC enabled Nek5000 on several GPU based HPC systems

J Vincent, J Gong, M Karp, A Peplinski… - … Conference on High …, 2022 - dl.acm.org
We present new results on the strong parallel scaling for the OpenACC-accelerated
implementation of the high-order spectral element fluid dynamics solver Nek5000. The test …

Tools for top-down performance analysis of GPU-accelerated applications

K Zhou, MW Krentel, J Mellor-Crummey - Proceedings of the 34th ACM …, 2020 - dl.acm.org
This paper describes extensions to Rice University's HPCToolkit performance tools to
support measurement and analysis of GPU-accelerated applications. To help developers …

On the performance portability of OpenACC, OpenMP, Kokkos and RAJA

A Marowka - … Conference on High Performance Computing in Asia …, 2022 - dl.acm.org
Performance Portability frameworks are becoming more central and essential in
heterogeneous computing systems. However, the developer toolbox lacks the tools to …

Exploring the acceleration of Nekbone on reconfigurable architectures

N Brown - 2020 IEEE/ACM International Workshop on …, 2020 - ieeexplore.ieee.org
Hardware technological advances are struggling to match scientific ambition, and a key
question is how we can use the transistors that we already have more effectively. This is …

High-performance spectral element methods on field-programmable gate arrays: implementation, evaluation, and future projection

M Karp, A Podobas, N Jansson, T Kenter… - 2021 IEEE …, 2021 - ieeexplore.ieee.org
Improvements in computer systems have historically relied on two well-known observations:
Moore's law and Dennard's scaling. Today, both these observations are ending, forcing …

Accelerated CPU–GPUs implementations for quaternion polar harmonic transform of color images

A Salah, K Li, KM Hosny, MM Darwish… - Future Generation …, 2020 - Elsevier
Image moments are used to capture image features. Moments are successfully used in
object descriptions, recognition, and other applications. However, image representation and …