Optimization techniques for GPU programming
In the past decade, Graphics Processing Units have played an important role in the field of
high-performance computing and they still advance new fields such as IoT, autonomous …
high-performance computing and they still advance new fields such as IoT, autonomous …
Applying the roofline model
G Ofenbeck, R Steinmann, V Caparros… - … Analysis of Systems …, 2014 - ieeexplore.ieee.org
The recently introduced roofline model plots the performance of executed code against its
operational intensity (operations count divided by memory traffic). It also includes two …
operational intensity (operations count divided by memory traffic). It also includes two …
Bi-objective optimization of data-parallel applications on heterogeneous HPC platforms for performance and energy through workload distribution
Performance and energy are the two most important objectives for optimization on modern
parallel platforms. In this article, we show that moving from single-objective optimization for …
parallel platforms. In this article, we show that moving from single-objective optimization for …
Performance Modeling for FPGAs: Extending the Roofline Model with High‐Level Synthesis Tools
The potential of FPGAs as accelerators for high‐performance computing applications is very
large, but many factors are involved in their performance. The design for FPGAs and the …
large, but many factors are involved in their performance. The design for FPGAs and the …
Sea-land segmentation using deep learning techniques for landsat-8 OLI imagery
Automated coastline extraction from optical satellites is fundamental to coastal mapping, and
sea-land segmentation is the core technology of coastline extraction. Deep convolutional …
sea-land segmentation is the core technology of coastline extraction. Deep convolutional …
Exploring GPU performance, power and energy-efficiency bounds with Cache-aware Roofline Modeling
Optimization, portability and development of GPGPU applications are not trivial tasks, since
the capabilities and organization of GPU processing elements and memory subsystem …
the capabilities and organization of GPU processing elements and memory subsystem …
Optimization of parallel iterated local search algorithms on graphics processing unit
Y Zhou, F He, Y Qiu - The Journal of Supercomputing, 2016 - Springer
Local search metaheuristics (LSMs) are efficient methods for solving hard optimization
problems in science, engineering, economics and technology. By using LSMs, we could …
problems in science, engineering, economics and technology. By using LSMs, we could …
A multi-GPU accelerated parallel domain decomposition one-step leapfrog ADI-FDTD
S Liu, B Zou, L Zhang, S Ren - IEEE Antennas and Wireless …, 2020 - ieeexplore.ieee.org
In this letter, a multi-GPU accelerated one-step leapfrog alternative-direction-implicit finite-
difference time-domain (ADI-FDTD) based on parallel SPIKE tridiagonal systems solver is …
difference time-domain (ADI-FDTD) based on parallel SPIKE tridiagonal systems solver is …
Efficient sparse-dense matrix-matrix multiplication on GPUs using the customized sparse storage format
Multiplication of a sparse matrix to a dense matrix (SpDM) is widely used in many areas like
scientific computing and machine learning. However, existing work under-looks the …
scientific computing and machine learning. However, existing work under-looks the …
[HTML][HTML] FPGA design space exploration for scientific HPC applications using a fast and accurate cost model based on roofline analysis
SW Nabi, W Vanderbauwhede - Journal of Parallel and Distributed …, 2019 - Elsevier
High-performance computing on heterogeneous platforms in general and those with FPGAs
in particular presents a significant programming challenge. We contend that compiler …
in particular presents a significant programming challenge. We contend that compiler …