On the instrumentation of OpenMP and OmpSs tasking constructs

S McIntosh-Smith, M Boulton, D Curran… - … Conference, ISC 2014 …, 2014 - Springer

With the advent of many-core computer architectures such as GPGPUs from NVIDIA and
AMD, and more recently Intel's Xeon Phi, ensuring performance portability of HPC codes is …

被引用次数：59 相关文章所有 9 个版本

[PDF] wiley.com

Flexible system software scheduling for asymmetric multicore systems with PMCSched: A case for Intel Alder Lake

C Bilbao, JC Saez… - … and Computation: Practice …, 2023 - Wiley Online Library

Asymmetric multicore processors (AMPs) couple high‐performance big cores and power‐
efficient small ones, all exposing a shared instruction set architecture to software, but with …

被引用次数：8 相关文章所有 2 个版本

[PDF] kaust.edu.sa

Performance analysis of tile low-rank cholesky factorization using parsec instrumentation tools

Q Cao, Y Pei, T Herault, K Akbudak… - 2019 IEEE/ACM …, 2019 - ieeexplore.ieee.org

This paper highlights the necessary development of new instrumentation tools within the
PaRSE task-based runtime system to leverage the performance of low-rank matrix …

被引用次数：26 相关文章所有 8 个版本

[HTML] diva-portal.org

Grain graphs: OpenMP performance analysis made easy

A Muddukrishna, PA Jonsson, A Podobas… - Proceedings of the 21st …, 2016 - dl.acm.org

Average programmers struggle to solve performance problems in OpenMP programs with
tasks and parallel for-loops. Existing performance analysis tools visualize OpenMP task …

被引用次数：33 相关文章所有 6 个版本

Rapid development of OS support with PMCSched for scheduling on asymmetric multicore systems

C Bilbao, JC Saez, M Prieto-Matias - European Conference on Parallel …, 2022 - Springer

Asymmetric multicore processors (AMPs) couple high-performance big cores and power-
efficient small ones, all exposing a shared instruction set architecture to software, but with …

被引用次数：5 相关文章所有 3 个版本

[PDF] researchgate.net

Evaluation of a performance portable lattice Boltzmann code using OpenCL

S McIntosh-Smith, D Curran - … of the International Workshop on OpenCL …, 2014 - dl.acm.org

With the advent of many-core computer architectures such as GPGPUs from NVIDIA and
AMD, and more recently Intel's Xeon Phi, ensuring performance portability of HPC codes is …

被引用次数：20 相关文章所有 7 个版本

[PDF] ntnu.no

Extending OMPT to support grain graphs

PV Langdal, M Jahre, A Muddukrishna - Scaling OpenMP for Exascale …, 2017 - Springer

The upcoming profiling API standard OMPT can describe almost all profiling events required
to construct grain graphs, a recent visualization that simplifies OpenMP performance …

被引用次数：11 相关文章所有 3 个版本

The secrets of the accelerators unveiled: Tracing heterogeneous executions through OMPT

G Llort, A Filgueras, D Jiménez-González… - … : Memory, Devices, and …, 2016 - Springer

Heterogeneous systems are an important trend in the future of supercomputers, yet they can
be hard to program and developers still lack powerful tools to gain understanding about how …

被引用次数：9 相关文章所有 5 个版本

[PDF] hal.science

Parallelization of iterative methods to solve sparse linear systems using task based runtime systems on multi and many-core architectures: application to Multi-Level …

A Roussel - 2018 - theses.hal.science

Numerical methods in reservoir engineering simulations lead to the resolution of
unstructured, large and sparse linear systems. The performances of iterative methods …

被引用次数：2 相关文章所有 4 个版本

[PDF] researchgate.net

Monitoring heterogeneous applications with the openmp tools interface

M Wagner, G Llort, A Filgueras… - Tools for High …, 2017 - Springer

Heterogeneous systems are gaining more importance in supercomputing, yet they are
challenging to program and developers require support tools to understand how well their …

被引用次数：2 相关文章所有 5 个版本