A benchmark set of highly-efficient CUDA and OpenCL kernels and its dynamic autotuning with Kernel Tuning Toolkit

F Petrovič, D Střelák, J Hozzová, J Ol'ha… - Future Generation …, 2020 - Elsevier
In recent years, the heterogeneity of both commodity and supercomputers hardware has
increased sharply. Accelerators, such as GPUs or Intel Xeon Phi co-processors, are often …

Autotuning of OpenCL kernels with global optimizations

J Filipovič, F Petrovič, S Benkner - Proceedings of the 1st workshop on …, 2017 - dl.acm.org
Autotuning is an important method for automatically exploring code optimizations. It may
target low-level code optimizations, such as memory blocking, loop unrolling or memory …

Tuning OpenCL applications with the periscope tuning framework

E Bajrovic, R Mijakovic, J Dokulil… - 2016 49th Hawaii …, 2016 - ieeexplore.ieee.org
Due to the complexity and diversity of new parallel architectures automatic tuning of parallel
applications has become increasingly important for achieving acceptable performance …

Analyzing performance properties collected by the PerSyst scalable HPC monitoring tool

D Brayford, C Bernau, W Hesse, C Guillen - arXiv preprint arXiv …, 2020 - arxiv.org
The ability to understand how a scientific application is executed on a large HPC system is
of great importance in allocating resources within the HPC data center. In this paper, we …

[PDF][PDF] Scalable Applications on Heterogeneous System Architectures: A Systematic Performance Analysis Framework

R Dietrich - 2019 - core.ac.uk
The efficient parallel execution of scientific applications is a key challenge in high-
performance computing (HPC). With growing parallelism and heterogeneity of compute …

Pipeline patterns on top of task-based runtimes

E Bajrovic, S Benkner, J Dokulil - … , PDCAT 2018, Jeju Island, South Korea …, 2019 - Springer
Task-based runtime systems have gained a lot of interest in recent years since they support
separating the specification of parallel computations from the concrete mapping onto a …

Automatic Selection of Tuning Plugins in PTF Using Machine Learning

R Mijaković, M Gerndt - 2020 IEEE International Parallel and …, 2020 - ieeexplore.ieee.org
Performance tuning of scientific codes often requires tuning many different aspects like
vectorization, OpenMP synchronization, MPI communication, and load balancing. The …

Scalable Applications on Heterogeneous System Architectures

R Dietrich - tud.qucosa.de
Abstract (EN) The efficient parallel execution of scientific applications is a key challenge in
high-performance computing (HPC). With growing parallelism and heterogeneity of compute …

[PDF][PDF] Finding the needle in a haystack: chasing rarely occurring bugs in concurrent software

F Ullah - 2016 - research-collection.ethz.ch
Ensuring software correctness is a challenging task to accomplish with the increasingly
large and complex functional as well as non-functional requirements of modern software …

A Time-Cost Based Automatic Scheduling Framework for Matrix Computation on Various Distributed Computing Platforms

R Gu, Z Liu, C Yuan, Y Huang - 2016 IEEE International …, 2016 - ieeexplore.ieee.org
Matrix computation is considered to be the core of many machine learning and graph
algorithm workloads. In traditional single-node age, numerical analysis platforms like R and …