Automatic Benchmarking and Optimization of Codes: An Experience with Numerical Kernels.

JR Herrero, JJ Navarro - International Workshop on Applied Parallel …, 2006 - Springer

We present two implementations of dense matrix multiplication based on two different non-
canonical array layouts: one based on a hypermatrix data structure (HM) where data …

被引用次数：13 相关文章所有 11 个版本

[PDF] psu.edu

Improving performance of hypermatrix Cholesky factorization

JR Herrero, JJ Navarro - European Conference on Parallel Processing, 2003 - Springer

This paper shows how a sparse hypermatrix Cholesky factorization can be improved. This is
accomplished by means of efficient codes which operate on very small dense matrices …

被引用次数：20 相关文章所有 12 个版本

[PDF] upc.edu

[图书][B] A framework for efficient execution of matrix computations

JR Herrero Zaragoza - 2006 - upcommons.upc.edu

Matrix computations lie at the heart of most scientific computational tasks. The solution of
linear systems of equations is a very frequent operation in many fields in science …

被引用次数：19 相关文章所有 14 个版本

[PDF] upc.edu

Compiler-optimized kernels: An efficient alternative to hand-coded inner kernels

JR Herrero, JJ Navarro - … Conference on Computational Science and Its …, 2006 - Springer

The use of highly optimized inner kernels is of paramount importance for obtaining efficient
numerical algorithms. Often, such kernels are created by hand. In this paper, however, we …

被引用次数：13 相关文章所有 12 个版本

[PDF] researchgate.net

New data structures for matrices and specialized inner kernels: Low overhead for high performance

JR Herrero - International Conference on Parallel Processing and …, 2007 - Springer

Dense linear algebra codes are often expressed and coded in terms of BLAS calls. This
approach, however, achieves suboptimal performance due to the overheads associated to …

被引用次数：9 相关文章所有 11 个版本

[PDF] academia.edu

Adapting linear algebra codes to the memory hierarchy using a hypermatrix scheme

JR Herrero, JJ Navarro - … Conference on Parallel Processing and Applied …, 2005 - Springer

LNCS 3911 - Adapting Linear Algebra Codes to the Memory Hierarchy Using a Hypermatrix
Scheme Page 1 Adapting Linear Algebra Codes to the Memory Hierarchy Using a …

被引用次数：6 相关文章所有 15 个版本

[PDF] psu.edu

[PDF][PDF] Sparse Hypermatrix Cholesky: Customization for High Performance.

JR Herrero, JJ Navarro - IMECS, 2006 - Citeseer

Efficient execution of numerical algorithms requires adapting the code to the underlying
execution platform. In this paper we show the process of fine tuning our sparse Hypermatrix …

被引用次数：4 相关文章所有 9 个版本

[PDF] upc.edu

A study on load imbalance in parallel hypermatrix multiplication using OpenMP

JR Herrero, JJ Navarro - … Conference on Parallel Processing and Applied …, 2005 - Springer

In this paper we present our work on the the parallelization of a matrix multiplication code
based on the hypermatrix data structure. We have used OpenMP for the parallelization. We …

被引用次数：3 相关文章所有 17 个版本

[PDF] researchgate.net

[PDF][PDF] Exposing inner kernels and block storage for fast parallel dense linear algebra codes

JR Herrero - 2008 - researchgate.net

Efficient execution on processors with multiple cores requires the exploitation of parallelism
within the processor. For many dense linear algebra codes this, in turn, requires the efficient …

被引用次数：2 相关文章所有 4 个版本

[PDF] academia.edu

[PDF][PDF] Using nonlinear array layouts in dense matrix operations

JR Herrero, JJ Navarro - Workshop on State-of-the-Art in Scientific …, 2006 - academia.edu

Using nonlinear array layouts in dense matrix operations Page 1 Using nonlinear array
layouts in dense matrix operations JR Herrero Introduction: A bottom-up approach …

被引用次数：2 相关文章所有 7 个版本