Communication-optimal parallel and sequential QR and LU factorizations

J Demmel, L Grigori, M Hoemmen, J Langou - SIAM Journal on Scientific …, 2012 - SIAM
We present parallel and sequential dense QR factorization algorithms that are both optimal
(up to polylogarithmic factors) in the amount of communication they perform and just as …

Programming matrix algorithms-by-blocks for thread-level parallelism

G Quintana-Ortí, ES Quintana-Ortí… - ACM Transactions on …, 2009 - dl.acm.org
With the emergence of thread-level parallelism as the primary means for continued
performance improvement, the programmability issue has reemerged as an obstacle to the …

Solving dense linear systems on platforms with multiple hardware accelerators

G Quintana-Ortí, FD Igual, ES Quintana-Ortí… - Proceedings of the 14th …, 2009 - dl.acm.org
In a previous PPoPP paper we showed how the FLAME methodology, combined with the
SuperMatrix runtime system, yields a simple yet powerful solution for programming dense …

Parallelizing dense and banded linear algebra libraries using SMPSs

RM Badia, JR Herrero, J Labarta… - Concurrency and …, 2009 - Wiley Online Library
The promise of future many‐core processors, with hundreds of threads running concurrently,
has led the developers of linear algebra libraries to rethink their design in order to extract …

[PDF][PDF] Communication-avoiding parallel and sequential QR factorizations

J Demmel, L Grigori, M Hoemmen… - CoRR abs …, 2008 - digicoll.lib.berkeley.edu
We present parallel and sequential dense QR factorization algorithms that are optimized to
avoid communication. Some of these are novel, and some extend earlier work …

Managing the complexity of lookahead for LU factorization with pivoting

E Chan, R van de Geijn, A Chapman - … of the twenty-second annual ACM …, 2010 - dl.acm.org
We describe parallel implementations of LU factorization with pivoting for multicore
architectures. Implementations that differ in two different dimensions are discussed:(1) using …

Communication-optimal parallel and sequential QR and LU factorizations: theory and practice

J Demmel, L Grigori, M Hoemmen, J Langou - arXiv preprint arXiv …, 2008 - arxiv.org
We present parallel and sequential dense QR factorization algorithms that are both optimal
(up to polylogarithmic factors) in the amount of communication they perform, and just as …

[PDF][PDF] Making programming synonymous with programming for linear algebra libraries

M Castillo, E Chan, FD Igual, R Mayo… - The University of Texas …, 2008 - Citeseer
We have invested heavily in hardware development but software tools and methods to use
the hardware continue to fall behind. The Sky is Falling. Panic is rampant. We respectfully …

Solving dense linear systems on accelerated multicore architectures

A Rémy - 2015 - theses.hal.science
In this PhD thesis, we study algorithms and implementations to accelerate the solution of
dense linear systems by using hybrid architectures with multicore processors and …

[图书][B] Matrix computations on graphics processors and clusters of gpus

FD Igual Peña - 2011 - tdx.cat
Las arquitecturas destinadas a computación de altas prestaciones (HPC) basadas en
aceleradores gráficos (GPUs) se han convertido en los últimos años en una alternativa …