On the performance portability of structured grid codes on many-core computer architectures

S McIntosh-Smith, M Boulton, D Curran… - … Conference, ISC 2014 …, 2014 - Springer
With the advent of many-core computer architectures such as GPGPUs from NVIDIA and
AMD, and more recently Intel's Xeon Phi, ensuring performance portability of HPC codes is …

Flexible system software scheduling for asymmetric multicore systems with PMCSched: A case for Intel Alder Lake

C Bilbao, JC Saez… - … and Computation: Practice …, 2023 - Wiley Online Library
Asymmetric multicore processors (AMPs) couple high‐performance big cores and power‐
efficient small ones, all exposing a shared instruction set architecture to software, but with …

Performance analysis of tile low-rank cholesky factorization using parsec instrumentation tools

Q Cao, Y Pei, T Herault, K Akbudak… - 2019 IEEE/ACM …, 2019 - ieeexplore.ieee.org
This paper highlights the necessary development of new instrumentation tools within the
PaRSE task-based runtime system to leverage the performance of low-rank matrix …

Grain graphs: OpenMP performance analysis made easy

A Muddukrishna, PA Jonsson, A Podobas… - Proceedings of the 21st …, 2016 - dl.acm.org
Average programmers struggle to solve performance problems in OpenMP programs with
tasks and parallel for-loops. Existing performance analysis tools visualize OpenMP task …

Rapid development of OS support with PMCSched for scheduling on asymmetric multicore systems

C Bilbao, JC Saez, M Prieto-Matias - European Conference on Parallel …, 2022 - Springer
Asymmetric multicore processors (AMPs) couple high-performance big cores and power-
efficient small ones, all exposing a shared instruction set architecture to software, but with …

Evaluation of a performance portable lattice Boltzmann code using OpenCL

S McIntosh-Smith, D Curran - … of the International Workshop on OpenCL …, 2014 - dl.acm.org
With the advent of many-core computer architectures such as GPGPUs from NVIDIA and
AMD, and more recently Intel's Xeon Phi, ensuring performance portability of HPC codes is …

Extending OMPT to support grain graphs

PV Langdal, M Jahre, A Muddukrishna - Scaling OpenMP for Exascale …, 2017 - Springer
The upcoming profiling API standard OMPT can describe almost all profiling events required
to construct grain graphs, a recent visualization that simplifies OpenMP performance …

The secrets of the accelerators unveiled: Tracing heterogeneous executions through OMPT

G Llort, A Filgueras, D Jiménez-González… - … : Memory, Devices, and …, 2016 - Springer
Heterogeneous systems are an important trend in the future of supercomputers, yet they can
be hard to program and developers still lack powerful tools to gain understanding about how …

Parallelization of iterative methods to solve sparse linear systems using task based runtime systems on multi and many-core architectures: application to Multi-Level …

A Roussel - 2018 - theses.hal.science
Numerical methods in reservoir engineering simulations lead to the resolution of
unstructured, large and sparse linear systems. The performances of iterative methods …

Monitoring heterogeneous applications with the openmp tools interface

M Wagner, G Llort, A Filgueras… - Tools for High …, 2017 - Springer
Heterogeneous systems are gaining more importance in supercomputing, yet they are
challenging to program and developers require support tools to understand how well their …