Compiler-optimized kernels: An efficient alternative to hand-coded inner kernels

D Ernst, G Hager, J Thies… - The International Journal …, 2021 - journals.sagepub.com

General matrix-matrix multiplications with double-precision real and complex entries
(DGEMM and ZGEMM) in vendor-supplied BLAS libraries are best optimized for square …

被引用次数：19 相关文章所有 16 个版本

[PDF] upc.edu

Using non-canonical array layouts in dense matrix operations

JR Herrero, JJ Navarro - International Workshop on Applied Parallel …, 2006 - Springer

We present two implementations of dense matrix multiplication based on two different non-
canonical array layouts: one based on a hypermatrix data structure (HM) where data …

被引用次数：13 相关文章所有 11 个版本

[PDF] upc.edu

[图书][B] A framework for efficient execution of matrix computations

JR Herrero Zaragoza - 2006 - upcommons.upc.edu

Matrix computations lie at the heart of most scientific computational tasks. The solution of
linear systems of equations is a very frequent operation in many fields in science …

被引用次数：19 相关文章所有 14 个版本

[PDF] researchgate.net

New data structures for matrices and specialized inner kernels: Low overhead for high performance

JR Herrero - International Conference on Parallel Processing and …, 2007 - Springer

Dense linear algebra codes are often expressed and coded in terms of BLAS calls. This
approach, however, achieves suboptimal performance due to the overheads associated to …

被引用次数：9 相关文章所有 11 个版本

[PDF] netlib.org

Level-3 Cholesky factorization routines improve performance of many Cholesky algorithms

FG Gustavson, J Waśniewski, JJ Dongarra… - ACM Transactions on …, 2013 - dl.acm.org

Four routines called DPOTF3i, i= a, b, c, d, are presented. DPOTF3i are a novel type of level-
3 BLAS for use by BPF (B locked P acked F ormat) Cholesky factorization and LAPACK …

被引用次数：4 相关文章所有 13 个版本

[PDF] academia.edu

[图书][B] Level-3 Cholesky Factorization Routines as Part of Manu Cholesky Algorithms

FG Gustavson, J Wasniewski, JJ Dongarra, JR Herrero… - 2011 - academia.edu

Some Linear Algebra Libraries use Level-2 routines during the factorization part of any
Level-3 block factorization algorithm. We discuss four Level-3 routines called DPOTF3i, i= a …

被引用次数：4 相关文章所有 12 个版本

[PDF] researchgate.net

[PDF][PDF] Exposing inner kernels and block storage for fast parallel dense linear algebra codes

JR Herrero - 2008 - researchgate.net

Efficient execution on processors with multiple cores requires the exploitation of parallelism
within the processor. For many dense linear algebra codes this, in turn, requires the efficient …

被引用次数：2 相关文章所有 4 个版本

[PDF] academia.edu

[PDF][PDF] Using nonlinear array layouts in dense matrix operations

JR Herrero, JJ Navarro - Workshop on State-of-the-Art in Scientific …, 2006 - academia.edu

Using nonlinear array layouts in dense matrix operations Page 1 Using nonlinear array
layouts in dense matrix operations JR Herrero Introduction: A bottom-up approach …

被引用次数：2 相关文章所有 7 个版本

[PDF] upc.edu

A square block format for symmetric band matrices

FG Gustavson, JR Herrero, E Morancho - International Conference on …, 2013 - Springer

This contribution describes a Square Block, SB, format for storing a banded symmetric
matrix. This is possible by rearranging “in place” LAPACK Band Layout to become a SB …

被引用次数：1 相关文章所有 6 个版本

New level-3 BLAS kernels for cholesky factorization

FG Gustavson, J Waśniewski, JR Herrero - Parallel Processing and …, 2012 - Springer

Abstract Some Linear Algebra Libraries use Level-2 routines during the factorization part of
any Level-3 block factorization algorithm. We discuss four Level-3 routines called DPOTF3 …

被引用次数：1 相关文章所有 5 个版本