Using non-canonical array layouts in dense matrix operations

JR Herrero, JJ Navarro - International Workshop on Applied Parallel …, 2006 - Springer
We present two implementations of dense matrix multiplication based on two different non-
canonical array layouts: one based on a hypermatrix data structure (HM) where data …

Improving performance of hypermatrix Cholesky factorization

JR Herrero, JJ Navarro - European Conference on Parallel Processing, 2003 - Springer
This paper shows how a sparse hypermatrix Cholesky factorization can be improved. This is
accomplished by means of efficient codes which operate on very small dense matrices …

[图书][B] A framework for efficient execution of matrix computations

JR Herrero Zaragoza - 2006 - upcommons.upc.edu
Matrix computations lie at the heart of most scientific computational tasks. The solution of
linear systems of equations is a very frequent operation in many fields in science …

Compiler-optimized kernels: An efficient alternative to hand-coded inner kernels

JR Herrero, JJ Navarro - … Conference on Computational Science and Its …, 2006 - Springer
The use of highly optimized inner kernels is of paramount importance for obtaining efficient
numerical algorithms. Often, such kernels are created by hand. In this paper, however, we …

New data structures for matrices and specialized inner kernels: Low overhead for high performance

JR Herrero - International Conference on Parallel Processing and …, 2007 - Springer
Dense linear algebra codes are often expressed and coded in terms of BLAS calls. This
approach, however, achieves suboptimal performance due to the overheads associated to …

Adapting linear algebra codes to the memory hierarchy using a hypermatrix scheme

JR Herrero, JJ Navarro - … Conference on Parallel Processing and Applied …, 2005 - Springer
LNCS 3911 - Adapting Linear Algebra Codes to the Memory Hierarchy Using a Hypermatrix
Scheme Page 1 Adapting Linear Algebra Codes to the Memory Hierarchy Using a …

[PDF][PDF] Sparse Hypermatrix Cholesky: Customization for High Performance.

JR Herrero, JJ Navarro - IMECS, 2006 - Citeseer
Efficient execution of numerical algorithms requires adapting the code to the underlying
execution platform. In this paper we show the process of fine tuning our sparse Hypermatrix …

A study on load imbalance in parallel hypermatrix multiplication using OpenMP

JR Herrero, JJ Navarro - … Conference on Parallel Processing and Applied …, 2005 - Springer
In this paper we present our work on the the parallelization of a matrix multiplication code
based on the hypermatrix data structure. We have used OpenMP for the parallelization. We …

[PDF][PDF] Exposing inner kernels and block storage for fast parallel dense linear algebra codes

JR Herrero - 2008 - researchgate.net
Efficient execution on processors with multiple cores requires the exploitation of parallelism
within the processor. For many dense linear algebra codes this, in turn, requires the efficient …

[PDF][PDF] Using nonlinear array layouts in dense matrix operations

JR Herrero, JJ Navarro - Workshop on State-of-the-Art in Scientific …, 2006 - academia.edu
Using nonlinear array layouts in dense matrix operations Page 1 Using nonlinear array
layouts in dense matrix operations JR Herrero Introduction: A bottom-up approach …