Optimization techniques for GPU programming
In the past decade, Graphics Processing Units have played an important role in the field of
high-performance computing and they still advance new fields such as IoT, autonomous …
high-performance computing and they still advance new fields such as IoT, autonomous …
GPU-accelerated molecular dynamics: State-of-art software performance and porting from Nvidia CUDA to AMD HIP
N Kondratyuk, V Nikolskiy, D Pavlov… - … Journal of High …, 2021 - journals.sagepub.com
Classical molecular dynamics (MD) calculations represent a significant part of the utilization
time of high-performance computing systems. As usual, the efficiency of such calculations is …
time of high-performance computing systems. As usual, the efficiency of such calculations is …
Angara interconnect makes GPU-based Desmos supercomputer an efficient tool for molecular dynamics calculations
V Stegailov, E Dlinnova, T Ismagilov… - … Journal of High …, 2019 - journals.sagepub.com
In this article, we describe the Desmos supercomputer that consists of 32 hybrid nodes
connected by a low-latency high-bandwidth Angara interconnect with torus topology. This …
connected by a low-latency high-bandwidth Angara interconnect with torus topology. This …
VASP hits the memory wall: Processors efficiency comparison
First‐principles calculations of electronic structure have been one of the most important
classes of supercomputer applications for a long time. In this paper, we consider VASP as a …
classes of supercomputer applications for a long time. In this paper, we consider VASP as a …
AI-accelerated CFD simulation based on OpenFOAM and CPU/GPU computing
In this paper, we propose a method for accelerating CFD (computational fluid dynamics)
simulations by integrating a conventional CFD solver with our AI module. The investigated …
simulations by integrating a conventional CFD solver with our AI module. The investigated …
Early performance evaluation of the hybrid cluster with torus interconnect aimed at molecular-dynamics simulations
V Stegailov, A Agarkov, S Biryukov, T Ismagilov… - Parallel Processing and …, 2018 - Springer
In this paper, we describe the Desmos cluster that consists of 32 hybrid nodes connected by
a low-latency high-bandwidth torus interconnect. This cluster is aimed at cost-effective …
a low-latency high-bandwidth torus interconnect. This cluster is aimed at cost-effective …
Performance and scalability of materials science and machine learning codes on the state-of-art hybrid supercomputer architecture
N Kondratyuk, G Smirnov, A Agarkov, A Osokin… - Russian …, 2019 - Springer
Abstract 8 of top 10 supercomputers of Top500 list published in November 2018 consist of
computing nodes with hybrid architectures that require special programming techniques. 5 …
computing nodes with hybrid architectures that require special programming techniques. 5 …
An study of the effect of process malleability in the energy efficiency on GPU-based clusters
The adoption of graphic processor units (GPU) in high-performance computing (HPC)
infrastructures determines, in many cases, the energy consumption of those facilities. For …
infrastructures determines, in many cases, the energy consumption of those facilities. For …
Machine learning method for energy reduction by utilizing dynamic mixed precision on GPU‐based supercomputers
K Rojek - Concurrency and Computation: Practice and …, 2019 - Wiley Online Library
In this work, we propose a method that allows us to reduce energy consumption of an
application executed on supercomputing centers. The proposed method is based on a …
application executed on supercomputing centers. The proposed method is based on a …
Matrix-matrix multiplication using multiple GPUS connected by Nvlink
YR Choi, V Nikolskiy, V Stegailov - 2020 Global Smart Industry …, 2020 - ieeexplore.ieee.org
In this work we present an original GPU-only parallel matrix-matrix multiplication algorithm
(C= αA* B+ βC) for servers with multiple GPUs connected by NVLink. The algorithm is …
(C= αA* B+ βC) for servers with multiple GPUs connected by NVLink. The algorithm is …