Parallelizing dense and banded linear algebra libraries using SMPSs

T Gautier, JVF Lima, N Maillard… - 2013 IEEE 27th …, 2013 - ieeexplore.ieee.org

Most recent HPC platforms have heterogeneous nodes composed of multi-core CPUs and
accelerators, like GPUs. Programming such nodes is typically based on a combination of …

被引用次数：275 相关文章所有 21 个版本

[PDF] utk.edu

A survey of recent developments in parallel implementations of Gaussian elimination

S Donfack, J Dongarra, M Faverge… - Concurrency and …, 2015 - Wiley Online Library

Gaussian elimination is a canonical linear algebra procedure for solving linear systems of
equations. In the last few years, the algorithm has received a lot of attention in an attempt to …

被引用次数：42 相关文章所有 10 个版本

Tools for power-energy modelling and analysis of parallel scientific applications

P Alonso, RM Badia, J Labarta… - 2012 41st …, 2012 - ieeexplore.ieee.org

Understanding power usage in parallel workloads is crucial to develop the energy-aware
software that will run in future Exascale systems. In this paper, we contribute towards this …

被引用次数：92 相关文章所有 4 个版本

[PDF] hal.science

Implementing multifrontal sparse solvers for multicore architectures with sequential task flow runtime systems

E Agullo, A Buttari, A Guermouche… - Acm transactions on …, 2016 - dl.acm.org

To face the advent of multicore processors and the ever increasing complexity of hardware
architectures, programming models based on DAG parallelism regained popularity in the …

被引用次数：68 相关文章所有 6 个版本

[PDF] susu.ru

Parallel programming models for dense linear algebra on heterogeneous systems

J Dongarra, M Abalenkovs, A Abdelfattah… - Supercomputing …, 2015 - superfri.susu.ru

We present a review of the current best practices in parallel programming models for dense
linear algebra (DLA) on heterogeneous architectures. We consider multicore CPUs, stand …

被引用次数：62 相关文章所有 16 个版本

[PDF] netlib.org

Porting the PLASMA numerical library to the OpenMP standard

A YarKhan, J Kurzak, P Luszczek… - International Journal of …, 2017 - Springer

PLASMA is a numerical library intended as a successor to LAPACK for solving problems in
dense linear algebra on multicore processors. PLASMA relies on the QUARK scheduler for …

被引用次数：51 相关文章所有 10 个版本

[PDF] osti.gov

Improving performance of GMRES by reducing communication and pipelining global collectives

I Yamazaki, M Hoemmen, P Luszczek… - 2017 IEEE …, 2017 - ieeexplore.ieee.org

We compare the performance of pipelined and s-step GMRES, respectively referred to as l-
GMRES and s-GMRES, on distributed multicore CPUs. Compared to standard GMRES, s …

被引用次数：36 相关文章所有 7 个版本

libKOMP, an Efficient OpenMP Runtime System for Both Fork-Join and Data Flow Paradigms

F Broquedis, T Gautier, V Danjean - … on OpenMP, IWOMP 2012, Rome, Italy …, 2012 - Springer

To efficiently exploit high performance computing platforms, applications currently have to
express more and more finer-grain parallelism. The OpenMP standard allows programmers …

被引用次数：60 相关文章所有 11 个版本

[PDF] hal.science

Multifrontal QR factorization for multicore architectures over runtime systems

E Agullo, A Buttari, A Guermouche, F Lopez - Euro-Par 2013 Parallel …, 2013 - Springer

To face the advent of multicore processors and the ever increasing complexity of hardware
architectures, programming models based on DAG parallelism regained popularity in the …

被引用次数：56 相关文章所有 11 个版本

[PDF] aspdac.com

A lightweight OpenMP4 run-time for embedded systems

RE Vargas, S Royuela, MA Serrano… - 2016 21st Asia and …, 2016 - ieeexplore.ieee.org

OpenMP is increasingly being adopted by current many-core embedded processors to
exploit their parallel computation capabilities. Unfortunately, current run-time …

被引用次数：37 相关文章所有 8 个版本