A benchmark set of highly-efficient CUDA and OpenCL kernels and its dynamic autotuning with Kernel Tuning Toolkit
In recent years, the heterogeneity of both commodity and supercomputers hardware has
increased sharply. Accelerators, such as GPUs or Intel Xeon Phi co-processors, are often …
increased sharply. Accelerators, such as GPUs or Intel Xeon Phi co-processors, are often …
Autotuning of OpenCL kernels with global optimizations
J Filipovič, F Petrovič, S Benkner - Proceedings of the 1st workshop on …, 2017 - dl.acm.org
Autotuning is an important method for automatically exploring code optimizations. It may
target low-level code optimizations, such as memory blocking, loop unrolling or memory …
target low-level code optimizations, such as memory blocking, loop unrolling or memory …
Tuning OpenCL applications with the periscope tuning framework
E Bajrovic, R Mijakovic, J Dokulil… - 2016 49th Hawaii …, 2016 - ieeexplore.ieee.org
Due to the complexity and diversity of new parallel architectures automatic tuning of parallel
applications has become increasingly important for achieving acceptable performance …
applications has become increasingly important for achieving acceptable performance …
Analyzing performance properties collected by the PerSyst scalable HPC monitoring tool
D Brayford, C Bernau, W Hesse, C Guillen - arXiv preprint arXiv …, 2020 - arxiv.org
The ability to understand how a scientific application is executed on a large HPC system is
of great importance in allocating resources within the HPC data center. In this paper, we …
of great importance in allocating resources within the HPC data center. In this paper, we …
[PDF][PDF] Scalable Applications on Heterogeneous System Architectures: A Systematic Performance Analysis Framework
R Dietrich - 2019 - core.ac.uk
The efficient parallel execution of scientific applications is a key challenge in high-
performance computing (HPC). With growing parallelism and heterogeneity of compute …
performance computing (HPC). With growing parallelism and heterogeneity of compute …
Pipeline patterns on top of task-based runtimes
E Bajrovic, S Benkner, J Dokulil - … , PDCAT 2018, Jeju Island, South Korea …, 2019 - Springer
Task-based runtime systems have gained a lot of interest in recent years since they support
separating the specification of parallel computations from the concrete mapping onto a …
separating the specification of parallel computations from the concrete mapping onto a …
Automatic Selection of Tuning Plugins in PTF Using Machine Learning
R Mijaković, M Gerndt - 2020 IEEE International Parallel and …, 2020 - ieeexplore.ieee.org
Performance tuning of scientific codes often requires tuning many different aspects like
vectorization, OpenMP synchronization, MPI communication, and load balancing. The …
vectorization, OpenMP synchronization, MPI communication, and load balancing. The …
Scalable Applications on Heterogeneous System Architectures
R Dietrich - tud.qucosa.de
Abstract (EN) The efficient parallel execution of scientific applications is a key challenge in
high-performance computing (HPC). With growing parallelism and heterogeneity of compute …
high-performance computing (HPC). With growing parallelism and heterogeneity of compute …
[PDF][PDF] Finding the needle in a haystack: chasing rarely occurring bugs in concurrent software
F Ullah - 2016 - research-collection.ethz.ch
Ensuring software correctness is a challenging task to accomplish with the increasingly
large and complex functional as well as non-functional requirements of modern software …
large and complex functional as well as non-functional requirements of modern software …
A Time-Cost Based Automatic Scheduling Framework for Matrix Computation on Various Distributed Computing Platforms
R Gu, Z Liu, C Yuan, Y Huang - 2016 IEEE International …, 2016 - ieeexplore.ieee.org
Matrix computation is considered to be the core of many machine learning and graph
algorithm workloads. In traditional single-node age, numerical analysis platforms like R and …
algorithm workloads. In traditional single-node age, numerical analysis platforms like R and …