Loop transformation recipes for code generation and auto-tuning

M Hall, J Chame, C Chen, J Shin, G Rudy… - … and Compilers for …, 2010 - Springer
In this paper, we describe transformation recipes, which provide a high-level interface to the
code transformation and code generation capability of a compiler. These recipes can be …

Speeding up nek5000 with autotuning and specialization

J Shin, MW Hall, J Chame, C Chen, PF Fischer… - Proceedings of the 24th …, 2010 - dl.acm.org
Autotuning technology has emerged recently as a systematic process for evaluating
alternative implementations of a computation, in order to select the best-performing solution …

Autotuning and specialization: Speeding up matrix multiply for small matrices with compiler technology

J Shin, MW Hall, J Chame, C Chen… - … to State-of-the-Art Results, 2010 - Springer
Autotuning technology has emerged recently as a systematic process for evaluating
alternative implementations of a computation to select the best-performing solution for a …

[图书][B] CUDA-CHiLL: A programming language interface for GPGPU optimizations and code generation

G Rudy - 2010 - search.proquest.com
The advent of the era of cheap and pervasive many-core and multicore parallel systems has
highlighted the disparity of the performance achieved between novice and expert …

Improving high-performance sparse libraries using compiler-assisted specialization: A PETSc case study

S Ramalingam, M Hall, C Chen - 2012 IEEE 26th International …, 2012 - ieeexplore.ieee.org
Scientific libraries are written in a general way in anticipation of a variety of use cases that
reduce optimization opportunities. Significant performance gains can be achieved by …

Analysis of a sparse hypermatrix Cholesky with fixed-sized blocking

JR Herrero, JJ Navarro - Applicable Algebra in Engineering …, 2007 - Springer
We present the way in which we have constructed an implementation of a sparse Cholesky
factorization based on a hypermatrix data structure. This data structure is a storage scheme …

[图书][B] A framework for efficient execution of matrix computations

JR Herrero Zaragoza - 2006 - upcommons.upc.edu
Matrix computations lie at the heart of most scientific computational tasks. The solution of
linear systems of equations is a very frequent operation in many fields in science …

Compiler-optimized kernels: An efficient alternative to hand-coded inner kernels

JR Herrero, JJ Navarro - … Conference on Computational Science and Its …, 2006 - Springer
The use of highly optimized inner kernels is of paramount importance for obtaining efficient
numerical algorithms. Often, such kernels are created by hand. In this paper, however, we …

A simulation study of effect of fish body size on communication performance in a fish farm monitoring environment

Y Taniguchi - 2016 European Modelling Symposium (EMS), 2016 - ieeexplore.ieee.org
In recent years, sensor network technologies have attracted attentions in the field of the
primary industry such as the aquaculture industry. We have proposed a fish farm monitoring …

Adapting linear algebra codes to the memory hierarchy using a hypermatrix scheme

JR Herrero, JJ Navarro - … Conference on Parallel Processing and Applied …, 2005 - Springer
LNCS 3911 - Adapting Linear Algebra Codes to the Memory Hierarchy Using a Hypermatrix
Scheme Page 1 Adapting Linear Algebra Codes to the Memory Hierarchy Using a …