On the performance portability of structured grid codes on many-core computer architectures
S McIntosh-Smith, M Boulton, D Curran… - … Conference, ISC 2014 …, 2014 - Springer
With the advent of many-core computer architectures such as GPGPUs from NVIDIA and
AMD, and more recently Intel's Xeon Phi, ensuring performance portability of HPC codes is …
AMD, and more recently Intel's Xeon Phi, ensuring performance portability of HPC codes is …
Flexible system software scheduling for asymmetric multicore systems with PMCSched: A case for Intel Alder Lake
Asymmetric multicore processors (AMPs) couple high‐performance big cores and power‐
efficient small ones, all exposing a shared instruction set architecture to software, but with …
efficient small ones, all exposing a shared instruction set architecture to software, but with …
Performance analysis of tile low-rank cholesky factorization using parsec instrumentation tools
This paper highlights the necessary development of new instrumentation tools within the
PaRSE task-based runtime system to leverage the performance of low-rank matrix …
PaRSE task-based runtime system to leverage the performance of low-rank matrix …
Grain graphs: OpenMP performance analysis made easy
Average programmers struggle to solve performance problems in OpenMP programs with
tasks and parallel for-loops. Existing performance analysis tools visualize OpenMP task …
tasks and parallel for-loops. Existing performance analysis tools visualize OpenMP task …
Rapid development of OS support with PMCSched for scheduling on asymmetric multicore systems
Asymmetric multicore processors (AMPs) couple high-performance big cores and power-
efficient small ones, all exposing a shared instruction set architecture to software, but with …
efficient small ones, all exposing a shared instruction set architecture to software, but with …
Evaluation of a performance portable lattice Boltzmann code using OpenCL
S McIntosh-Smith, D Curran - … of the International Workshop on OpenCL …, 2014 - dl.acm.org
With the advent of many-core computer architectures such as GPGPUs from NVIDIA and
AMD, and more recently Intel's Xeon Phi, ensuring performance portability of HPC codes is …
AMD, and more recently Intel's Xeon Phi, ensuring performance portability of HPC codes is …
Extending OMPT to support grain graphs
PV Langdal, M Jahre, A Muddukrishna - Scaling OpenMP for Exascale …, 2017 - Springer
The upcoming profiling API standard OMPT can describe almost all profiling events required
to construct grain graphs, a recent visualization that simplifies OpenMP performance …
to construct grain graphs, a recent visualization that simplifies OpenMP performance …
The secrets of the accelerators unveiled: Tracing heterogeneous executions through OMPT
G Llort, A Filgueras, D Jiménez-González… - … : Memory, Devices, and …, 2016 - Springer
Heterogeneous systems are an important trend in the future of supercomputers, yet they can
be hard to program and developers still lack powerful tools to gain understanding about how …
be hard to program and developers still lack powerful tools to gain understanding about how …
Parallelization of iterative methods to solve sparse linear systems using task based runtime systems on multi and many-core architectures: application to Multi-Level …
A Roussel - 2018 - theses.hal.science
Numerical methods in reservoir engineering simulations lead to the resolution of
unstructured, large and sparse linear systems. The performances of iterative methods …
unstructured, large and sparse linear systems. The performances of iterative methods …
Monitoring heterogeneous applications with the openmp tools interface
Heterogeneous systems are gaining more importance in supercomputing, yet they are
challenging to program and developers require support tools to understand how well their …
challenging to program and developers require support tools to understand how well their …