Optimization techniques for GPU programming

P Hijma, S Heldens, A Sclocco… - ACM Computing …, 2023 - dl.acm.org
In the past decade, Graphics Processing Units have played an important role in the field of
high-performance computing and they still advance new fields such as IoT, autonomous …

GPU-accelerated molecular dynamics: State-of-art software performance and porting from Nvidia CUDA to AMD HIP

N Kondratyuk, V Nikolskiy, D Pavlov… - … Journal of High …, 2021 - journals.sagepub.com
Classical molecular dynamics (MD) calculations represent a significant part of the utilization
time of high-performance computing systems. As usual, the efficiency of such calculations is …

Angara interconnect makes GPU-based Desmos supercomputer an efficient tool for molecular dynamics calculations

V Stegailov, E Dlinnova, T Ismagilov… - … Journal of High …, 2019 - journals.sagepub.com
In this article, we describe the Desmos supercomputer that consists of 32 hybrid nodes
connected by a low-latency high-bandwidth Angara interconnect with torus topology. This …

VASP hits the memory wall: Processors efficiency comparison

V Stegailov, G Smirnov, V Vecher - … and Computation: Practice …, 2019 - Wiley Online Library
First‐principles calculations of electronic structure have been one of the most important
classes of supercomputer applications for a long time. In this paper, we consider VASP as a …

AI-accelerated CFD simulation based on OpenFOAM and CPU/GPU computing

K Rojek, R Wyrzykowski, P Gepner - … , Krakow, Poland, June 16–18, 2021 …, 2021 - Springer
In this paper, we propose a method for accelerating CFD (computational fluid dynamics)
simulations by integrating a conventional CFD solver with our AI module. The investigated …

Early performance evaluation of the hybrid cluster with torus interconnect aimed at molecular-dynamics simulations

V Stegailov, A Agarkov, S Biryukov, T Ismagilov… - Parallel Processing and …, 2018 - Springer
In this paper, we describe the Desmos cluster that consists of 32 hybrid nodes connected by
a low-latency high-bandwidth torus interconnect. This cluster is aimed at cost-effective …

Performance and scalability of materials science and machine learning codes on the state-of-art hybrid supercomputer architecture

N Kondratyuk, G Smirnov, A Agarkov, A Osokin… - Russian …, 2019 - Springer
Abstract 8 of top 10 supercomputers of Top500 list published in November 2018 consist of
computing nodes with hybrid architectures that require special programming techniques. 5 …

An study of the effect of process malleability in the energy efficiency on GPU-based clusters

S Iserte, K Rojek - The Journal of Supercomputing, 2020 - Springer
The adoption of graphic processor units (GPU) in high-performance computing (HPC)
infrastructures determines, in many cases, the energy consumption of those facilities. For …

Machine learning method for energy reduction by utilizing dynamic mixed precision on GPU‐based supercomputers

K Rojek - Concurrency and Computation: Practice and …, 2019 - Wiley Online Library
In this work, we propose a method that allows us to reduce energy consumption of an
application executed on supercomputing centers. The proposed method is based on a …

Matrix-matrix multiplication using multiple GPUS connected by Nvlink

YR Choi, V Nikolskiy, V Stegailov - 2020 Global Smart Industry …, 2020 - ieeexplore.ieee.org
In this work we present an original GPU-only parallel matrix-matrix multiplication algorithm
(C= αA* B+ βC) for servers with multiple GPUs connected by NVLink. The algorithm is …